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EXECUTIVE  SUMMARY 

This  report  contains  the  results  of  a  Phase  I  Small  Business  Innovative  Research  project 
funded  by  DARPA  in  Topic  92-049,  Wavelets  and  Failure  Prediction.  The  objectives  of  this 
project  were:  1)  to  assess  the  efficacy  of  wavelet  techniques  to  select  features  upon  which  simple 
and  reliable  classifiers  could  base  decisions  regarding  abnormal  changes  in  system  behavior,  and 
2)  to  develop  and  test  an  algorithm  for  the  detection  of  failures  in  vibrating  systems  such  as 
gearboxes  and  pumps.  For  reliable  and  robust  classification  with  low  false  alarm  rates,  the 
selected  features  must  be  robust  and  maximally  informative.  That  is,  these  features  must:  1) 
reliably  persist  in  the  presence  of  noise  and  disturbances,  and  also  2)  provide  maximally 
distinguishable  characteristics  for  the  detection  and  classification  of  failure  modes.  In  this  effort, 

'  we  have  achieved  these  objectives,  and  have  also  developed  and  tested  the  preliminary  version  of  a 

failure  detection  methodology  that  is  computationally  simple,  reliable,  and  robust. 

Wavelets  offer  several  different  ways  to  access  the  structure  of  a  signal  in  time  and 
frequency.  In  this  project,  we  found  that  the  continuous  wavelet  tran^orm  (CWT)  provided  an 
ideal  tool  to  identify  significant  features  for  fault  detection  in  vibrating  systems  such  as  gearboxes, 

I  and  to  provide  the  basis  for  extracting  those  features.  Like  other  image-visualization  techniques 

such  as  the  short-term  discrete  Fourier  transform  (DFT),  the  CWT  converts  a  one-dimensional 
signal  into  a  two-dimensional  image.  However,  the  CWT  eliminates  windowing  artifacts  and 
provides  more  flexibility  to  trade  time  resolution  for  frequency  resolution.  In  the  CWT,  each  line 
[  of  the  image  corresponds  to  the  time  series  response  of  one  of  the  filters  in  a  constant-Q  filter  bank 

!  driven  by  the  observed  sensor  data.  This  filter  bank  provides  complete  coverage  of  time  and 

^  frequency  behavior.  The  resulting  CWT  image  provides  an  extremely  useful  visualizaion  of  the 

structure  of  the  sensor  signals;  by  comparing  images  from  varying  conditions,  one  can  identity  a 
comparatively  small  number  of  features  that  are  robust  and  maximally  informative. 

I 
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The  ability  to  visually  identify  critical  features  during  the  algorithm  design  phase  leads  to  a 
second,  even  more  important  consequence  in  the  implementation  phase:  it  makes  the  job  of  the 
adaptive  classifier  far  easier.  Because  so  much  of  the  signal  structure  is  obvious,  and  because  the 
need  for  robust  features  leads  one  to  focus  only  on  high-energy  areas  of  the  CWT  images,  the 
number  of  features  needed  for  classification  can  be  astonishingly  small.  In  comparison  to  feature 
sets  based  on  the  Discrete  Fourier  Transform,  which  may  have  from  256  to  16384  elements  in  a 
feature  vector  for  the  faulty  systems  examined  here,  we  found  that  less  than  20  features  sufficed  to 
obtain  reliable  separation  among  classes  for  the  gearbox  and  pump  data  available  to  this  effort. 
Small  feature  sets  lead  to  extremely  simple  three-layer  artificial  neural  network  (ANN)  classifiers — 
n  the  order  of  50  processing  elements — which  could  then  be  designed  with  noteworthy  ease  and 
without  the  need  for  exotic  training  algorithms.  Also,  the  amount  of  data  needed  to  train  such 
simple  nets  is  quite  small — we  were  able  to  train  classifiers  using  only  500  milliseconds  of  data 
from  each  sample  case. 

The  result  of  this  effort  is  an  extremely  flexible  and  powerful  methodology  to  exploit  the 
power  of  wavelet  techniques  to  detect  failures  in  vibrating  systems.  The  essential  elements  of  this 
methodology  are:  1)  an  off-line  set  of  techniques  to  identify  high  energy,  statistically  significant 
features  in  the  CWT;  2)  a  wavelet-based  preprocessor  to  extract  the  most  useful  features  from  the 
sensor  signal,  and  3)  simple  ANNs  (incorporating  a  decision-deferral  mechanism  to  defer  any 
decision  if  the  current  feature  sample  is  determined  to  be  ambiguous)  for  the  subsequent 
classification  task.  In  the  gearbox  and  pump  data  sets  used  in  this  study,  the  algorithms  designed 
using  this  method  achieved  perfect  detection  performance  (1.000  probability  of  detection,  and 
0.000  false  alarm  probability),  with  a  probability  <  .04  that  a  decision  would  be  deferred  for  a  few 
milliseconds — again  based  on  only  500  milliseconds  of  data  from  each  sample  case. 

While  the  full  CWT  may  represent  a  significant  computational  requirement,  it  is  only  used 
in  the  off-line  design  phase  to  identify  critical  features.  The  final  implementation  of  a  fault 
detection  system  consists  of  a  comparatively  simple  wavelet-based  preprocessor  (probably 
implementable  as  a  single-chip  custom  integrated  circuit,  at  least  for  audio  frequency  processing), 
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followed  by  a  very  simple  ANN — a  configuration  ideally  suited  to  real-time  implementation  using 
either  digital  or  analog  hardware. 

This  effort  shows  the  exceptional  promise  of  our  wavelet-based  method  for  failure 
detection  in  vibrating  systems.  There  is  no  known  impediment  to  fielding  this  technology  within 
an  additional  24  months  for  simple  systems  such  as  terrestrial  pumps,  electrical  machinery,  or 
engines.  However,  more  demanding  applications,  such  as  machinery  (gearboxes)  mounted  on 
moving  platforms  (helicopters)  that  have  other  sources  of  high-energy  vibration  (engines),  raise 
some  additional  technical  issues  that  could  provide  the  focus  for  a  Phase  n  effort. 

In  particular,  the  data  used  in  Phase  I  was  taken  firom  a  test  stand  or  under  mild  operating 
conditions,  and  thus  does  not  display  the  full  array  of  environmental  disturbances  and  exogenous 
sources  of  vibration  that  will  make  applications  to  helicopter  gearboxes  more  difficult — the  fault 
detector  must  be  able  to  separate  unexpected  variations  in  environmental  vibrations  from 
unexpected  changes  in  the  gearbox  which  may  indicate  a  developing  failure.  Thus  one  must 
expand  substantially  the  set  of  conditions  under  which  data  has  been  collected  to  include  the  full 
ranging  of  operating  conditions. 

To  counter  the  additional  complexity  offered  by  uncontrolled  environmental  vibrations, 
there  are  two  critical  ways  in  which  performance  of  our  methods  can  be  significantly  enhanced. 
(While  such  enhancements  were  completely  unnecessary  for  the  data  sets  used  in  Phase  I,  we  fully 
expect  that  they  would  be  needed  in  an  operational  helicopter  environment.)  Specifically, 
throughout  the  Phase  I  effort  only  a  single  channel  of  sensor  data  was  used  in  any  of  the  algorithm 
designs  and  tests.  However,  typically  there  will  be  several  sensor  sets  available  for  processing 
(e.g.  for  the  data  used  in  Phase  I  there  were  rw'O  accelerometer  channels).  In  the  comparatively 
benign  environment  of  test  stand  data,  one  channel  proved  sufficient  for  robust  and  reliable 
detection  performance,  but  in  an  operational  environment,  one  would  expect  that  achieving  such 
reliable  performance  would  require  the  combined  use  of  all  sensor  channels  in  an  integrated  system 
in  which  wavelet-based  features  are  eyfracted  jointly  from  the  full  set  of  channels.  Intuitively, 
some  sensors  may  provide  information  on  background  vibrations  that  may  mask  or  interfere  with 

iii 


TR-567 


ALPHATECH,  INC. _ 

the  features  crucial  to  fault  detection,  and  this  background  must  be  taken  into  account  during  on¬ 
line  feature  extraction.  In  particular,  phase  differences  between  the  CWTs  of  case-  and  support- 
mounted  sensors  may  provide  significant  information  concerning  the  physical  origin  of  the 
vibrations  at  each  frequency  of  interest. 

Ln  addition,  one  would  also  expect  that  other  helicopter  system  sensors  might  be  of 
considerable  use  in  providing  situational  data,  i.e.,  to  identify  environmental  conditions  and  other 
vibrational  sources  influencing  the  gearbox  sensors,  thus  providing  useful  inputs  to  tiie  failure 
detection  algorithm.  For  example,  one  could  adapt  the  classification  thresholds  and  parameters  to 
the  amount  of  torque  applied  by  the  engines  to  the  drive  shafts  (which  affects  both  the  intensity  and 
harmonic  structure  of  a  gearbox  signal),  or  to  gross  flight  conditions  (e.g.,  grounded,  vertical 
ascent,  hovering,  cruise,  etc.) 

In  addition  to  these  enhancements,  there  are  also  several  other  issues  of  some  importance  to 
an  operational  design.  One  such  issue  is  the  context  in  which  the  fault  detection  system  is  to 
operate.  For  example,  is  the  goal  simply  to  detect  failure,  or  is  detailed  failure  diagnosis  desired  in 
order  to  guide  maintenance  activities?  What  response  time  is  needed  for  the  detection  system  in 
order  to  avoid  catastrophic  failure?  Requirements  for  system-level  objectives  such  as  these  will 
obviously  be  needed  in  setting  algorithm  parameters  and  for  assessing  overall  performance  against 
these  objectives. 

Just  as  obviously,  the  practical  implementation  of  the  algorithm  must  be  considered. 

Indeed,  as  we  have  indicated,  our  algorithms  appear  to  be  suited  to  either  digital  or  analog 
implementation,  and  the  issues  of  sizing,  computational  speed  requirements,  etc.,  would  need  to  be 
considered. 

Thus  there  is  much  to  do  in  order  to  turn  these  results  into  a  final  product  suitable  for  flight 
testing  on  operational  helicopters.  However,  the  exceptional  performance  achieved  in  Phase  I 
clearly  indicates  that  the  potential  of  our  wavelet-based  failure  detection  method  merits  the 
additional  effort  required  to  realize  it. 


IV 


TR-567 


ALPHATECH,  INC 


CONTENTS 


Secriofi 


Page 


EXECUTIVE  SUMMARY . 

LIST  OF  FIGURES  . 

LIST  OF  TABLES  . 

COLOR  PLATES  . 

1  INTRODUCTION . 

1 . 1  Identification  and  Significance  of  the  Problem . 

1 . 2  Objectives  of  Phase  I . 

1.3  Overview  of  Phase  I  Results . 

1 .4  Report  Organization . 

2  PHASE  I  TECHNICAL  EFFORT . 

2. 1  Data  Used  in  Phase  I . 

2.2  System  Structure  . 

2.3  Wavelet-Based  Tunable  Preprocessor . 

2.4  Feature  Separation . 

2 . 5  Artificial  Neural  Network  Classifier . 

2 . 6  Fault  Detection  and  Identification  Results  from  Phase  I ... . 

2.7  Analysis  of  False  Alarm  and  Deferral  Probabilities  . 

3  CONCLUSIONS  AND  RECOMMENDATIONS . 

3. 1  Conclusions  from  the  Phase  I  Effort . 

3 . 2  Phase  n  Recommendations . 

REFERENCES . 

APPENDIX  A:  THE  CONTINUOUS  WAVELET  TRANSFORM . 

APPENDIX  B:  BASIC  MATHEMATICAL  ASPECTS  OF  THE  CWT 


,i 

vi 

,vii 


1 

1 

2 

,4 

.4 

,5 

.5 

,7 

,9 

,15 

.20 

.22 

.23 

.32 

.32 

.33 

.37 

.44 

.49 


V 


TR-567 


ALPHATECH,  INC 


LIST  OF  FIGURES 

Number  JEass 

2- 1  Illustration  of  Intermediate  Gearbox .  6 

2-2  Incipient  Fault  Detection  and  Classification  System  Structure .  8 

2-3  Tunable  Feature  Extractor .  10 

2-4  False  Alarm  Time  History  for  0%  Deferral  Rate .  25 

2-5  False-Alarm  Interarrival-Times  Histogram  for  0%  Deferral  Rate .  26 

2-6  False-Alarm  Time  History  for  3.7%  Deferral  Rate .  27 

2-7  False-Alarm  Interarrival-Time  Histogram  for  3.7%  Deferral  Rate .  27 

2-8  False-Alarm  Time  History  for  5.7%  Deferral  Rate .  28 

2-9  Accelerometer  Readings  for  First  4  Seconds  of  Channel  5 — Normal  Case .  29 

2-10  Deferral  Time  History  for  3.7%  Defer  1  Rate .  30 

2-11  Deferrals  Time  History  for  5.7%  Deferral  Rate .  31 

A- 1  The  Haar  and  Kiang  Wavelets .  44 

l 

A-2  Dilated  Translates  of  the  Haar  Wavelet .  45 

A-3  Typical  Subdivision  by  Wavelets  of  the  Time-Frequency  Plane .  45 

A-4  Classification  of  Wavelets .  46 


VI 


TR-567 


ALPHATECH,  INC 


LIST  OF  TABLES 

Number  Page 

2-1  CONDENSATE  PUMP  INFORMATION .  6 

2-2  FIRE  PUMP  INFORMATION .  6 

2-3  PHASE  I  CLUSTER  SEPARATIONS,  HELICOPTER  DATA .  16 

2-4  NUMBER  OF  PROCESSING  ELEMENTS  PER  AN74  LA'iTR .  20 

2-5  PREPROCESSOR  TIME  CONSTANTS  AND  FEATURE  VECTOR  RATES .  21 

2-6  PHASE  I  PERFORMANCE  RESULTS .  22 

2-7  FALSE  ALARMS  AND  DEFERRALS  FOR  HELICOPTER  GEARBOX .  24 

2-8  FALSE-ALARM  INTERARRIVAL-TIME  STATISTICS  FOR  SEVERAL 

DEFERRAL  RATES .  26 

2-9  DEFERRAL  INTERARRIVAL-TIME  STATISTICS .  31 


vii 

I 


TR-567 


ALPHATECH,  INC 


Plate  A. 
Plate  B. 
Plate  C. 
Plate  D. 
Plate  E. 
Plate  F. 
Plate  G. 
Plate  H. 
Plate  I. 
Plate  J. 
Plate  K. 
Plate  L. 
Plate  M. 
Plate  N. 
Plate  O. 


COLOR  PLATES 

Effect  of  Smoothing  the  Continuous  Wavelet  Transform 

Effect  of  Decreasing  the  Kiang  Wavelet  Frequency  Resolution 

Condensate  Pump  Axial  and  Radial  Data  Channels  in  the  CWT  Domain 

Channel  5  Helicopter  Gearbox  Fault  Condition  Signatures  in  the  CWT  Domain 

Channel  6  Helicopter  Gearbox  Fault  Condition  Signatures  in  the  CWT  Domain 

Channel  5  Helicopter  Gearbox  Signatures  After  Masking  Out  Low-Level  Energy 

Channel  6  Helicopter  Gearbox  Signatures  After  Masking  Out  Low-Level  Energy 

Condensate  Pumps  Signatures  in  the  CWT  Domain 

Helicopter  Gearbox  Feature  Cluster  Separation 

Condensate  Pump  Feature  Cluster  Separation 

Fire  Pump  Feature  Cluster  Separation 

Artifacts  in  Training  Data — Helicopter  Gearbox 

Continuous  Wavelet  Transforms  of  a  Pulse  and  a  Sine 

Continuous  Wavelet  Transforms  of  Pulse  Plus  Sine  and  Two  Sines 

Continuous  Wavelet  Transforms  of  Poisson  and  Gaussian  White  Noises 


VHl 


TR-567 


Philr,  A  ;  I  ot  Smtiotlun;'  ihr  *  o  iiiiUK'us  \V:i\i-|ri  !  Miislor’ii 


Cfindi-nsiite  Pump  A\i;tl  uikI  D.u.i  tris.umcls  ui  ilu:  C'W'F  Dom.im 


16K 


8K 


4K 


2K, 


16K 


8K 


4K 


2K 


.VJ:—,  5.-,^  j.'".V'  ‘^~ 


0  5K  ‘"5  5K 

250  msec,  500  msec  250  msec.  ,  500  msec.  "250  msec  500  msec 

Normal  Inner  Race  Outer  Race 


SK 


4K 


2K 


8K 


4K 


2K 


•■'*  •'  1  ■-•,'0f  ' 

}iliiimiiitir''‘‘- 


III .  .  . .  .  .1  ,K  ■ 


'  '  _ 


i 


250  msec. 

Gear  Spall 


0  5K 

500  msec.  •250m5ec- 


500  msec  .*^'^^250  msec. 


Rolling  Element  . 


n  Channel  5  Heli  .aaptei  Gearlxix  Fanh  Cinili'io 


500  msec. 
.Tooth  CuJ^ 


D<in;:iin 


\* 


16K, 


r.16K 


'' v.^ 

ii>iii^Mipj|»n>i^  I| 


■/-- r,/.  >A 


ismsss^^sm 


-^i.SK  0.5K  ' 

jiSOrns^c.  „  5u0  msec  250  msec.  '  500  msec.  2j50mseo.  \  500  msec. 

Gear  Spall  -  Rolliri^i  Element  ■  Tooth  Cut 


•  ’  0.5K_.. 

2-50  msec.  .500  msec.  • 
Normal 


0,5K 


Ifiner  Race 


250  msec,  500  msec,*’ 

Outer  Race 


..  16K 


n  Sf.  ri  Sf: 

I'oU  ri-n'ec  ’500  250  r>tie>:  5iJU  ni-^r  j5ij  rrij-i-;  5UU  rrir^-: 

Norm-Ell  Ifirirr  Race  Ou^er  Pace 

It.f 


U 


0  5^  .  ^  . 
.■5<  I  r.'i 


O^it-  --.pE 


5«:hj  m, 

Rvllifi-j  C  l-iii'i-nl 


'  5J 


T-.Mth  Lijt 


ALPHATECH,  INC. 


No 


Ho 


4  ^:H 


4  H 


Condensate  Pump  Signatures  in  the  CWT  Domain 


Plate  H 


ond 


Pumc 


en 


o 

«o 


; I n  r  i  i  i 

12.00  -15.00  -18.00  -21.00  -24.00  -27.00  -30.00  -33.00 


o 

o 

d 

m 

I 


-36.00 


Second  Harmonic  Power  Level,  dB  relative  to  fundamental 


Plate  I.  Helicopter  Gearbox  Feature  Cluster  Separation 
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Plate  J.  Condensate  Pump  Feature  Cluster  Separation 
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SECTION  1 
INTRODUCTION 

This  rep<»t  presents  the  results  of  a  Phase  I  Small  Business  Innovation  Research  project 
funded  by  DARPA  in  Tqpic  92-049,  Wavelets  and  Failure  Prediction.  Recommendations  for  a 
Phase  n  follow-on  are  also  included. 

1.1  IDENTIFICATION  AND  SIGNIFICANCE  OF  THE  PROBLEM 

The  timely  and  reliable  detection  of  changes  in  the  dynamic  behavior  of  complex  systems 
and  signals  is  a  problem  of  considerable  importance  in  a  vast  array  of  military  and  civilian 
applications.  As  we  continue  to  place  increasingly  demanding  objectives  on  system  performance, 
cost,  and  reliability,  the  needs  for  and  requirements  on  such  detection  methods  grow 
commensuiately.  For  example,  the  increasing  role  of  and  reliance  on  computer  control — for  the 
fly-by-wire  control  of  advanced  high-performance  aircraft  and  helicopters,  the  navigation  of 
autonomous  vehicles,  etc. — intakes  the  detection  of  system  anomalies  essential,  since  by  their  very 
nature  such  automatic  systems  siirq>ly  do  not  have  tiie  luxury  of  relying  on  the  extraordinary  but 
workload-limited  detection  capabilities  of  their  human  pilots.  Also,  die  cost  of  modern-day 
military  systems  are  such  that  there  are  tremendous  payoffs  to  be  gained  if  the  availability  of  a 
weapons  system  is  improved,  or  its  life  cycle  cost  reduced. 

These  dijectives  provided  much  of  the  motivation  for  the  development  of  self-repairing 
flight  control  system  concepts  (Weiss  and  Hsu,  1987)  for  the  in-flight  detection  of  battle  damage 
and  senses  and  actuator  failures  in  advanced  aircraft  in  order:  1)  to  facilitate  control  system 
reconfiguration  to  allow  mission  completion  (or  at  least  the  safe  return  of  the  vehicle),  and  2)  to 
provide  early  diagnosis  of  problems  that  could  then  speed  up  the  maintenance  process  and  reduce 
turn-around  time. 
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Furthermore,  the  reliable  detection  of  component  damage  or  failure  can  have  a  dramatic 
effect  on  the  cost  of  maintaining  and/or  replacing  an  advanced  military  vehicle  such  as  a  helicopter, 
ship,  fighter,  or  even  unmanned  vehicles.  Specifically,  the  total  cost  of  such  a  system  is  so  high 
that  the  objective  of  avoiding  system  loss  due  to  an  undetected  failure  in  some  component  places 
severe  demands  on  the  overall  reliability  of  components  and  their  monitoring  systems. 

Moreover,  this  need  for  reliability  has  typically  led  to  extremely  conservative  maintenance 
and  replacement  procedures:  components  are  automatically  replaced  after  time  in  service  reaches  a 
prescribed  limit,  usually  taken  to  be  significantly  less  than  their  expected  failure  times.  Thus  the 
availability  of  advanced  and  reliable  fault  detection  systems  offers  the  promise  not  only  of 
improved  system  reliability  but  also  the  possibility  of  increasing  component  time  in  service  by 
detecting  the  onset  of  problems  and  thus  allowing  “retirement  for  cause”  rather  than  the  more 
expensive  present  practice  of  replacing  components  whether  they  need  it  or  not 

These  and  a  variety  of  other  factors  and  applications  have  led  to  considerable  research  and 
development  activity  over  the  past  20  years  resulting  in  an  array  of  detection  and  diagnosis 
methods  (see,  for  example,  the  widely  referenced  surveys  Willsky  (1976)  and  Basseville  (1987)) 
providing  us  with  an  analytically  sound,  proven-in-practice  foundation  from  which  to  pursue  the 
new  challenges  arising  as  we  push  harder  on  the  envelope  of  performance,  reliability,  and  cost. 
Moreover,  in  the  past  few  years  significant  new  methods  of  signal  analysis  and  panem  recognition 
(in  particular,  wavelet  transforms  and  artificial  neural  networks)  have  been  developed  offering  the 
promise  of  adding  significantly  to  the  arsenal  of  detection  methods  and  to  the  range  of  applications 
that  can  be  dealt  with  successfully. 

1.2  OBJECTIVES  OF  PHASE  I 

The  objective  of  the  Phase  I  effort  was  to  assess  the  efficacy  of  wavelet  techniques  for 
features  upon  which  an  adaptive  classifier  could  base  its  decisions  regarding  abnormal 
changes  in  system  behavior.  For  reliable,  robust  classification  with  low  false  alarm  rates,  these 
features  must  be: 
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•  high  energy  in  at  least  one  case  (normal  or  failed)  in  order  to  persist  reliably  even  in  the 
presence  of  environmental  noises  or  transient  disturbances;  and 

*  statistically  significant  in  separating  two  or  more  cases  from  one  another  in  order  to 
contribute  meaningful  information  to  the  pattern  classifier. 

Our  objective  was  not  to  develop  new  classification  techniques,  but  rather  to  modify  off- 
the-shelf  artificial  neural  network  (ANN)  (Lau  and  Widrow,  1990a,  1990b)  technology  as  needed 
to  integrate  it  with  a  front-end  feature  extractor  based  on  wavelet  techniques. 

Wavelets  offer  many  different  ways  to  access  the  structure  of  a  signal  in  time/scale  space. 
The  continuous  wavelet  transform  (CWT)  (Ruskai,  Beylkin,  et  al.,  1992;  Daubechies,  1990; 
Mallat,  1989a,  1989b;  Meyer,  1988)  converts  a  time  signal  into  an  image,  from  which  features  can 
be  extracted  using  image  processing  techniques.  The  wavelet  packet  transform  (WPT)  (Coifman 
and  Wickerhauser,  1992;  Coiftnan  et  al.,  1990)  derives  coefficients  of  wavelet  basis  functions  that 
characterize  time/scale  energy  distribution  in  a  much  more  flexible  manner  than  discrete  Fourier 
transforms  (DFTs)  permit.  Variations  on  the  WPT  permit  the  selection  of  subsets  of  an 
overcomplete  set  of  basis  functions  to  find  the  most  significant  elements  of  a  signal.  More  recent 
extensions  to  wavelet  techniques,  presented  under  the  general  classification  of  multiscale  signal 
processing,  create  even  more  options  for  feature  characterization.  Our  initial  goal  was  to  select 
several  alternatives,  and  to  compare  their  performance  and  computational  requirements  in  the 
context  of  whatever  data  were  available.  As  the  effort  progressed,  we  focussed  our  attention  on 
CWTs  and  WPTs. 

Because  the  dominant  challenge  in  failure  detection  problems  is  to  identify  a  concise  yet 
distinctive  set  of  features  on  which  the  detection/classification  process  can  be  made  to  depend,  our 
emphasis  in  Phase  I  was  on  the  back-end  of  the  feature-selection  process,  i.e.,  we  initially 
assumed  that  the  full  set  of  wavelet  transform  coefficients  was  already  available,  and  then 
determined  which  subset  was  most  critical  to  good  performance  of  an  ANN  classifier.  Helicopter 
gearbox  and  shipboard  pump  accelerometer  data,  supplied  by  the  Navy,  were  passed  through  a 
CWT  and  WPT  preprocessors,  and  then  used  to  train  ANN  classifiers.  Statistics  on  false  alarm 
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rates,  miss  detections,  and  misclassification  errors  were  used  to  quantify  the  performance  of  the 
proposed  methodology. 

1.3  OVERVIEW  OF  PHASE  I  RESULTS 
Phase  I  of  this  effort  clearly  demonstrated  the  feasibility  of  incipient fault  detection  for 

vibrating  systems  not  only  for  bench  test  conditions  (helicopter  gearbox)  but  also  for  mild 
operating  conditions  (condensate  and  fire  pumps).  Remarkable  Phase  I  results  were  obtained  by 
using  a  balanced  combination  of  CWTs  and  ANNs.  We  used  the  (TWT  to  select  features  for  an 
ANN  classifier.  The  wavelet  transform  provided  enough  visibility  into  fault  signals  to  allow  us  to 
reduce  the  size  of  the  feature  set  to  10-15  features.  We  used  a  low-dimensional,  conventional 
ANN  classifier  (Widrow  et  al.,  1988)  with  rejection  of  ambiguous  classifications.  We  achieved 
0.000  probability  of  false  alarm,  0.000  probability  of  missed  detection,  and  <  0.04  probability  of 
deferral  (to  a  subsequent  feature  vector)  for  all  three  data  sets  provided  by  the  Navy.  The  major 
product  of  out  Phase  I  work  is  a  single  methodology  to  identify  robust  features  that  lead  to  these 
performance  levels. 

1.4  REPORT  ORGANIZATION 

Section  2  describes  the  major  components  of  the  technical  approach  and  the  main  results  of 
this  Phase  I  effort  Section  3  presents  the  main  conclusions  and  recommendations  for  future 
effort.  Appendix  A  contains  an  overview  of  the  CWT  and  presents  some  examples  to  give  the 
reader  insight  into  the  time-frequency  infonnation  provided  by  the  CWT  based  on  the  Kiang 
wavelet  used  in  this  work.  Appendix  B  provides  some  mathematical  background  on  the  CWT. 
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SECTION  2 

PHASE  I  TECHNICAL  EFFORT 

This  section  describes  the  data  used,  the  major  components  of  the  technical  approach,  and 
the  main  results  obtained  for  the  three  types  of  vibrating  systems  considered  in  this  work.  The 
most  important  steps  of  the  technical  approach  are  illustrated  with  selected  examples  based  on  the 
available  data  sets  and  using  the  color  plates  located  just  before  the  body  of  this  report. 

2.1  DATA  USED  IN  PHASE  I 

For  this  research  the  Naval  Command,  Control,  and  Ocean  Surveillance  Center  (NCCOSC) 
supplied  data  for  three  vibrating  systems:  helicopter  gearbox,  condensate  pumps,  and  fire  pumps. 
These  data  are  from  accelerometers  that  measure  vibrations  at  one  or  more  places  on  the  case  of  the 
vibrating  mechanism. 

The  gearbox  data  (from  a  relatively  simple  TH-IL  helicopter  intermediate  42-degree 
gearbox,  illustrated  in  Fig.  2-1)  consisted  of  vibration  readings  (sampled  at  48  kHz)  from  two 
accelerometers  (channels  5  and  6,  oriented  with  and  orthogonal  to  the  bearing  load  zones, 
respectively)  mounted  on  the  gearbox  output  end  for  six  separate  fault  conditions:  no  defect  (ND), 
bearing  inner  race  fault  (IR),  bearing  rolling  element  fault  (RE),  bearing  outer  race  fault  (OR),  gear 
spall  fault  (SP),  and  gear  1/2  tooth  cut  fault  (TC).  This  is  a  subset  of  the  “Hollins  data  base,” 
developed  by  Mark  Hollins  of  the  Naval  Air  Test  Center  (NATC).  The  pump  data  consisted  of 
vibration  readings  (sampled  at  50  kHz)  from  two  triaxial  accelerometers  (axial,  radial,  tangential 
channels  were  available)  mounted  on  the  motor  and  pump  ends  of  the  assembly,  one  triad  on  each 
end.  The  condensate  pump  data  consisted  of  eight  data  segments  that  included  two  fault  types  and 
four  unfailed  units.  The  fire  pump  data  consisted  of  16  data  segments  that  included  four  fault  types 
and  eleven  unfailed  units.  Unlike  the  helicopter  data,  which  are  bench  test  data  with  seeded  faults, 
the  pump  data  were  obtained  from  shipboard  pumps  operating  under  relatively  mild  conditions. 
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Figure  2- 1 .  Illustration  of  Intermediate  Gearbox 


Tables  2-1  and  2-2  present  the  information  available  on  the  condensate  and  fire  pump  data, 
respectively.  In  both  cases,  fault  code  0  is  used  to  label  good  pumps  (normal  cases).  Also,  fire 
pump  fault  codes  3B  and  4A  denote  the  same  fault  type  on  different  pumps. 


TABLE  2-1.  CONDENSATE  PUMP  INFORMATION 


Segment  Number 

Pump  ID 

Fault  Code 

RPM 

1 

CP-IA 

1 

9(X) 

2 

CP-IB 

0 

900 

3 

CP-4A 

2 

885 

4 

CP-4B 

0 

885 

5 

CP-2A 

0 

890 

6 

CP-2B 

0 

890 

7 

CP-3A 

0 

892 

8 

CP-3B 

2 

892 

TABLE  2-2.  FIRE  PUMP  INFORMATION 


Segment  Number 

Pump  ID 

Fault  Code 

RPM 

1 

FP-9 

3A 

3576 

2 

FP-3 

0 

3585 

3 

FP-2 

0 

3585 

4 

FP-1 

0 

3585 

5 

FP-4 

0 

3575 

6 

FP-5 

0 

3580 
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7 

FP-6 

0 

3580 

8 

FP-5A 

0 

3585 

9 

FP-6A 

4 

3580 

10 

FP-13A 

0 

3590 

11 

FP-12 

5 

3585 

12 

FP-13 

3B 

3580 

13 

FP-14 

6 

3570 

14 

FP-17 

0 

3585 

15 

FP-16 

0 

3585 

16 

FP-15 

0 

3588 

For  our  test  systems  we  used  only  one  channel,  from  one  sensor — we  deferred  fusion  of 
results  from  multichannel  data  to  Phase  H.  For  the  gearbox  system,  only  channel  5  was  used  for 
all  conditions.  For  both  pump  systems,  only  the  axial  component  of  the  pump-end  accelerometer 
triad  was  used.  In  a  sense,  we  deliberately  made  the  Phase  I  problem  harder  by  ignoring  some 
sources  of  information  in  order  to  demonstrate  the  power  of  wavelet  techniques,  or  lack  thereof,  on 
a  fault  detection/classification  problem  more  difficult  than  one  would  expect  to  encounter  in  the 
field  under  more  severe  conditions. 

2.2  SYSTEM  STRUCTURE 

We  adopted  the  conventional  architecture  of  an  adaptive  classifier  (Fig.  2-2):  a  real-time 
preprocessor  to  focus  the  information  about  the  state  of  a  system  into  a  low-dimensional  feature 
vector,  followed  by  an  adaptive  pattern  analyzer  to  map  feature  vectors  into  detections  and 
classifications.  Our  work  emphasized  the  development  of  the  preprocessor,  using  the  insight 
offered  by  recent  advances  in  the  mathematics  of  wavelets. 

Our  Phase  I  proposal  indicated  that  our  approach  of  choice  was  to  use  the  WPT  (Coifman 
and  Wickerhauser,  1992;  Coifman,  Meyer,  et  al.;  1990)  to  identify  locations  in  time/scale  space 
indicative  of  faults.  This  approach  met  with  less  than  expected  success.  With  hindsight,  coupled 
with  analysis  of  considerable  performance  data  generated  during  the  project,  we  believe  that  the 
WPT  is  most  appropriate  for  problems  where  classification  depends  on  the  timing  and  internal 
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Figure  2-2.  Incipient  Fault  Detection  and  Classification  System  Structure 


structure  of  transient  events,  such  as  classifying  biological  sounds  in  the  sea.  They  offer 


considerably  less  advantage  in  fault  analysis  of  vibrating  systems,  where  the  signatures  are  of 


relatively  long  duration  and  statistically  quite  stationary.  However,  we  reserve  interest  in  the  WPT 


for  detecting  incipient  faults  in  systems  that  emit  transient  signals  as  part  of  their  normal  operation 


(e.g.,  heavy  duty  mechanical  or  electrical  switching  systems). 

As  an  alternative,  we  turned  to  the  CWT.  Like  other  image-visualization  techniques  such 
as  the  Discrete  Fourier  Transform  (DFT),  the  CWT  cemverts  a  one-dimensional  signal  into  a  two- 
dimensional  image,  using  substantial  computational  resources.  Sub-bands  of  the  CWT  can  be 
evaluated  quite  efficiently  in  a  preprocessor,  however.  Therefore,  we  use  the  full-blown  CWT 
during  the  design  process  to  identify  a  few  bands  that  differentiate  among  cases,  and  only 
implement  actual  feature  detectors  for  those  specific  bands — to  focus  the  irrformation  available  in  a 
signal  into  a  small  set  of  features.  This  allows  us  to  find  very  small  feature  vectors  (of  the  order  of 
10  -  20  elements)  that  nonetheless  yield  outstanding  detection  and  classification  performance.  A 
brief  explanation  of  the  CWT  and  examples  of  the  CWTs  of  elementary  signals  are  presented  in 
Appendix  A. 

For  the  pattern  analyzer,  we  used  conventional  three-level,  feedforward  ANNs.  As  will  be 
seen,  we  succeeded  in  finding  feature  sets  that  are  nearly  convex  and  linearly  separable,  so  we  did 
not  need  complex  network  topologies  or  exotic  training  algorithms.  (In  fact,  we  were  able  to  set 
the  number  of  elements  in  the  hidden  layer  equal  to  the  number  of  output  elements). 
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To  improve  performance  we  suppressed  classification  results  entirely  if  the  maximum 
output  value  was  less  than  some  multiple  of  the  next  larger  output  value,  deferring  the  classification 
to  the  next  available  feature  vector.  We  could  trade  deferral  rate  for  false  alarm/missed  detection 
performance  by  changing  this  multiple.  A  multiple  of  2.0  was  adequate  to  eliminate  all  false  alarms 
and  missed  detections,  and  kept  the  deferral  rate  below  4%. 

The  Phase  I  feature-selection  process  used  a  number  of  tools  to  develop  feature  sets. 
KHOROS  signal  processing  routines  (KHOROS  Group,  1992),  on  a  SUN  computer,  supported 
editing  and  preliminary  analysis  of  raw  data  files.  A  custom  Macintosh  Pascal  package  computed 
the  CWT,  and  another  one  extracted  the  selected  feature  vectors  from  the  raw  data.  Excel,  a 
commercial  package  from  Microsoft,  supported  the  statistical  cluster  analysis.  Macintosh 
NeuralWorks,  a  commercial  package  from  NeuralWare,  was  used  to  carry  out  the  ANN  training 
and  testing.  MATLAB,  a  commercial  package  from  The  Math  Works,  Inc.,  was  used  to  compute 
performance  metrics. 

2.3  WAVELET-BASED  TUNABLE  PREPROCESSOR 

Figure  2-3  presents  the  wavelet-based  tunable  feature  extractor  developed  in  this  Phase  I 
effort.  The  CWT  is  computed  using  the  Kiang  wavelet^  which  allows  one  to  select  appropriate 
firequency  and  time  resolutions  to  extract  fiom  the  CWT  the  features  of  interest  To  eliminate 
clutter  and  “spectral  speckle”  from  the  CWT,  we  smooth  and  decimate  the  CWT  before  extracting 
the  features  of  interest  These  bands  are  then  parameterized  to  achieve  better  feature  separability. 
Because  of  the  properties  of  this  wavelet-based  feature  extractor,  it  is  not  necessary  to  compute  the 
entire  CWT  to  extract  a  few  features;  only  the  frequency  bands  associated  with  the  features  of 
interest  need  to  be  computed.  This  leads  to  a  significant  reduction  of  computational  effort  if  the 
number  of  selected  features  is  relatively  small  (say,  10  to  20).  (Note  that  while  this  preprocessor  is 
currently  implemented  in  software,  it  is  a  good  candidate  for  hardware  implementation — on  a 


^  Named  after  Nelson  Kiang  of  MIT,  who  derived  tuning  curves  for  auditory  nerve  neurons  in  the  middle  1960’s. 
For  this  project,  we  selected  a  mother  wavelet  whose  Fourier  transform  closely  matches  these  tuning  curves. 
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Macintosh  Quadra  900  computer,  with  no  optimization,  it  runs  about  100  times  slower  than  real 


time.) 


fr«9U«neyf 


1 

*  Figure  2-3.  Tunable  Feature  Extractor 


As  our  emphasis  in  Phase  I  was  on  the  selection  of  a  concise,  focused  feature  set,  we 
employed  the  visibility  into  time/scale  space  offered  by  the  CWT.  Our  objective  was  to  focus  all  of 
the  potentially  available  time/frequency  information  into  a  small  feature  space,  since  a  small, 
separable  feature  set  reduces  the  complexity  required  of  a  classifier,  and  hence  the  risk  of  slow  or 
non-convergence. 

Another  advantage  of  the  CWT  is  its  ability  to  isolate  robust  high-energy  features  that  can 
be  readily  detected  and  can  be  suppressed  only  with  a  great  deal  of  external  energy.  Our  approach 


f 


to  feature  selection  was  explicitly  to  ignore  regions  of  time/scale  space  with  consistently  low 


energy,  on  the  grounds  that  whatever  classification  might  be  possible  using  such  features  would 
not  be  robust  to  disturbances.  (However,  we  did  include  one  low-energy  band  specifically  as  a 


disturbance  detector,  where  classifications  would  be  suppressed  whenever  substantial  energy 
appeared  in  this  band.)  We  knew  that  bench  test  or  mild-operation  data  is  invariably  cleaner  than 
field  data  (it  may  not  represent  a  complete  range  of  normal  operations  or  disturbances,  and  may 
contain  artifacts  not  present  in  real  data)  and  therefore  we  sought  features  that  would  serve  as  well 
in  noisy  environments  as  they  do  on  these  data  sets.  Thus  we  are  prepared  to  handle  much  more 


challenging  data,  since  we  address  robustness  at  the  very  beginning  of  the  feature-selection 


process. 
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2.3.1  The  Continuous  Wavelet  Transform 

The  CWT  appears  as  an  image.  It  is  qualitatively  similar  to  other  imaging  representations 
of  a  signal,  such  as  sonograms,  lofargrams,  or  waterfall  displays.  The  biggest  distinction  is  that 
one  raster  line  appears  in  the  image /or  every  sample  in  the  signal — there  is  no  windowing  of  the 
data,  and  hence  no  artifacts  in  the  image  due  to  windowing.  Minor  distinctions  include  the  fact  that 
the  scale  axis  is  logarithmic,  since  wavelet  theory  involves  continuous  scaling  of  a  single  basis 
function.  If  one  thinks  of  the  discrete  Fourier  transform  as  the  output  of  a  bank  of  constant- 
bandwidth  filters,  then  one  can  think  of  the  CWT  as  the  output  of  a  bank  of  constant-Q  (ratio  of 
bandwidth  to  center  frequency)  filters.  (The  WPT,  on  the  other  hand,  is  like  a  variable-bandwidth, 
variable-Q  filter  bank  (Vetterli  and  Herley,  1990).) 

Visual  analysis  of  the  CWT  reveals  areas  of  interest  in  vibrational  data.  The  raw  CWT 
magnitudes  include  high-frequency  artifacts  that  can  be  removed  by  smoothing  over  time.  The 
helicopter  data  segments  show  high-energy,  narrowband  features  overlaid  by  short  disturbances 
resulting  from  the  periodic  clash  of  gear  teeth  and  (in  failed  cases)  the  impact  of  bearings  on  defects 
in  the  bearing  tracks.  The  pump  data  segments  show  lower-energy,  broadband  features  overlaid 
by  aperiodic  impulsive  disturbances,  presumably  representing  flow  noise  and  exogenous 
disturbances.  In  all  cases,  features  appear  to  be  stable  over  significant  periods  of  time. 

The  CWTs  presented  in  the  color  plates  referenced  in  the  sequel  (and  located  just  before  the 
body  of  this  document)  illustrate  these  general  observations.  These  CWTs  use  the  Kiang  wavelet 
(fine  frequency,  coarse  time  resolution).  Hue  encodes  the  log  magnitude  of  the  CWT  (blue  =  low, 
red  =  high).  Phase  information  is  ignored.  Time  divisions  are  62.5  msec  wide.  The  frequency 
scale  is  logarithmic,  and  frequency  divisions  conespond  to  octaves. 

2.3.2  Smoothing  the  CWT 

The  two  images  in  Plate  A  are  from  the  first  500  msec  of  channel  5  of  the  normal  helicopter 
gearbox  data  (sampled  at  48  kHz).  The  left  image  is  the  CWT  sampled  every  2  msec — the  full 
CWT  for  this  data  segment  would  be  24,000  pixels  high.  The  bright  red  line  just  below  2,048  Hz 
is  the  gear  mesh  fundamental.  Harmonics  of  this  fundamental  appear  at  higher  frequencies  (finer 


11 


TR-567 


ALPHATECH,  INC. _ 

scales),  although  there  appears  to  be  little  energy  in  the  fourth,  sixth,  and  seventh  harmonics  under 
normal  conditions  (possibly  due  to  a  zero  in  the  tranfer  function  between  the  gear  assembly  and  the 
sensor  site,  at  least  in  the  range  of  the  sixth  and  seventh  harmonic).  The  data  appears  to  be  high- 
pass  filtered  with  a  cut-off  frequency  around  1  kHz,  although  some  frequency  lines  are  visible 
below  this  point 

Because  faults  in  vibrating  systems  impact  a  sensor  on  every  cycle  of  the  mechanism,  we 
seek  features  that  persist  over  relatively  long  periods  of  time  (many  cycles).  While  the  texture  of 
the  raw  CWT  between  harmonic  lines  is  interesting,  it  would  be  imprudent  to  attempt  to  classify 
faults  based  on  the  structure  of  this  texture.  Thus  for  this  data  set,  and  for  all  other  CWT  images 
we  constructed,  the  image  was  smoothed  in  the  time  dimension  to  suppress  high-frequency 
textures,  and  enhance  the  stationary  elements  of  the  signal.  The  right  image  in  Plate  A  shows  the 
results  of  this  smoothing.  Among  other  things,  the  smoothing  enhances  the  appearance  of  a 
secondary  line  just  above  the  mesh  fundamental.  Also,  some  5  Hz  modulation  on  the  fifth 
harmonic  becomes  more  apparent. 

Note  that  we  do  not  present  this  entire  image  to  the  ANN  for  classification.  Our  goal  is  to 
identify  features  in  the  image  that  can  be  parameterized,  and  to  compute  much  more  concise 
parameter  vectors  from  the  image  to  submit  to  the  net 

Note  also  that  the  CWT  visualization  makes  the  feature-selection  process  quite  efficient. 
Having  developed  a  methodology  on  the  gearbox  data,  we  were  able  to  construct  low-dimensional 
feature  sets  for  the  two  pump  data  sets  in  a  matter  of  hours. 

2.3.3  Changing  the  Wavelet  Basis 

The  two  images  in  Plate  B  are  from  the  first  500  msec  of  failed  condensate  pump  data  (CP- 
lA  from  Table  2-1).  The  left  image  is  the  analog  of  the  smoothed  helicopter  CWT.  It  immediately 
shows  the  lack  of  stable,  narrowband  elements  in  the  signal.  Again,  it  would  be  imprudent  to 
attempt  classification  based  on  the  microstructure  of  the  transient  narrowband  features  in  this 
image.  Instead,  we  seek  more  broadband  features. 
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One  appeal  of  the  CWT  is  that  it  allows  one  to  continuously  vary  time/scale  resolution.  By 
changing  the  wavelet  on  which  the  transform  is  based,  one  can  sacrifice  resolution  in  scale — which 
is  exactly  what  is  necessary  to  find  broadband  features.  The  right  image  is  the  same  data,  but 
using  a  wavelet  transform  with  1/lOth  the  frequency  resolution  (along  with  additional  smoothing 
over  time  to  suppress  transients  and  textures).  This  image  clearly  shows  the  locations  of  high 
energy  content — and  these  regions  are  surprisingly  stable  compared  to  those  of  the  left  image. 
2.3.4  Channel  Selection 

The  two  images  in  Plate  C  are  from  the  first  500  msec  of  unfailed  fire  pump  data  (FP-3 
from  Table  2-2).  The  left  image  is  from  the  radial  channel,  the  right  from  the  axial  chaiuiel.  Lines 
for  the  first  few  harmonics  of  the  shaft  rotation  frequency  (about  60  Hz)  are  clearly  visible,  along 
with  a  faint  harmonic  series  based  at  about  400  Hz,  and  a  broadband  signal  from  300  Hz  to  1,500 
Hz. 

As  mentioned  earlier,  we  limited  Phase  I  feature  selection  and  failure  detection/classification 
analysis  to  a  single  channel  of  data  for  each  equipment  type  (so  that  we  did  not  exhaust  all  of  the 
potential  processing  gain  on  these  clean  data,  and  thus  can  offer  ways  to  counter  the  additional 
complexity  one  would  expect  in  field  data).  We  selected  the  axial  channel  alone  for  further 
processing  for  the  pump  cases,  and  channel  5  alone  for  the  gearbox  case. 

Since  we  had  available  another  channel  of  helicopter  gearbo.x  data  (channel  6,  the  only 
other  data  channel  provided  to  us  by  NCCOSC),  however,  we  decided  to  conduct  a  preliminary 
analysis  of  these  data  in  order  to  explore  the  practical  issues  that  must  be  confronted  when  dealing 
with  more  complicated  vibrating  systems  where  more  than  one  channel  of  sensor  data  is  available, 
either  from  the  same  location  or  from  different  locations.  Recall  that  channels  5  and  6  come  from 
accelerometers  at  the  gearbox  output  end;  channel  5  is  oriented  with  the  bearing  load  zone,  and 
channel  6  is  oriented  orthogonal  to  this  direction. 

Using  the  same  parameters  and  approach  used  for  channel  5  data,  we  computed  the  CWTs 
from  channel  6  data  for  each  of  the  gearbox  fault  conditions.  Discussion  of  the  results  are  included 
in  the  two  following  subsections. 
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2.3.5  Gearbox  Fault  Sfgn'^tures 

Plate  D  shows  segments  of  the  CWT  for  the  six  cases  of  gearbox  data.  Each  image 
represents  250  msec  of  signal,  from  512  Hz  to  16  kHz.  Note  the  difference  in  structure  of  the 
CWT  around  the  third  harmonic  of  the  mesh  frequency  (1,935  Hz) — both  in  breadth  ,md  texture. 

Plate  E  shows  a  similar  segment  for  channel  6  data.  Note  that  the  CWT  features  revealed 
by  channel  6  data  are  not  as  sharply  defined  as  those  from  channel  5;  this  is  especially  clear  for  the 
third  harmonic  on  the  normal  case.  The  fifth  harmonic  (9675  Hz)  of  the  mesh  frequency  serves  to 
separate  the  Inner  Race  and  Rolling  Element  faults  from  the  normal  case.  Also,the  fourth  harmonic 
appears  to  separate  the  five  fault  cases  from  the  normal  case  as  well. 

The  above  findings  show  that  by  using  features  from  both  channel  5  and  6  a  better 
distinction  could  be  drawn  among  the  six  fault  conditions.  The  best  way  to  fuse  these  sources  of 
information,  however,  was  not  addressed  in  the  Phase  I  effort,  but  we  plan  to  pursue  this  subject 
during  Phase  11. 

2.3.6  Gearbox  Fault  Masks 

Plate  F  shows  the  same  segments  of  the  CWT  for  the  six  cases  of  gearbox  data,  with  low- 
energy  regions  masked  out.  Our  rationale  for  this  is  to  prevent  using  features  that  are  weak,  as 
they  are  easily  compromised  by  disturbances  or  interference  and  hence  do  not  contribute  to  high 
reliability  detection.  The  technique  used  to  mask  these  regions  is  a  simple  morphological  filter 
applied  across  scale,  masking  areas  that  fall  below  an  estimated  noise  floor.  Note  how  this 
technique  highlights  the  significant  differences  in  structure  around  the  third  harmonic,  and  also  of 
the  1 ,050  Hz  line.  Techniques  such  as  this  morphological  filter  provide  quantitative  insight  into 
feature  set  performance  before  time  and  energy  is  spent  training  and  testing  an  adaptive  classifier. 

Plate  G  shows  a  similar  segment  for  channel  6.  Note  that  the  significant  differences 
between  channel  5  and  channel  6  CWTs  become  clearer  because  of  the  masking  of  low-energy 
regions. 
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2.3.7  Condensate  Pump  Signatures 

Plate  H  shows  the  CWTs  for  125  msec  of  the  axial  channel  for  each  of  the  eight  condensate 
pump  data  segments  in  Table  2-1.  The  log  frequency  scale,  between  16  Hz  and  16  kHz,  is  divided 
into  octaves.  Segment  1  contains  a  fault  of  type  1,  segments  3  and  8  each  contain  a  fault  of  type  2, 
and  the  rest  are  good  pumps  (normal  case).  Note  that  the  CWTs  show  clear  differences  between 
pumps  in  pairs  of  segments  1  and  2,  3  and  4,  and  7  and  8.  Each  of  these  pairs  includes  a  normal 
pump  and  a  defective  pump.  On  the  other  hand,  segments  5  and  6,  both  from  good  pumps, 
display  similar  high-energy  features. 

In  contrast  with  the  helicopter  gearbox  data,  the  condensate  pump  CWTs  contain  wider 
high-energy  regions,  but  they  are — as  in  the  gearbox  case — relatively  stable  over  time. 

2.4  FEATURE  SEPARATION 

Detection  and  classification  become  exceptionally  easy  if  the  clusters  of  features 
corresponding  to  different  cases  exhibit  two  properties:  convexity  and  separability.  In  these  cases, 
classification  becomes  a  matter  of  estimating  boundaries  to  separate  the  clusters — and  we  can 
allocate  one  element  of  the  hidden  layer  of  an  ANN  to  each  cluster. 

One  never  knows  ahead  of  time  whether  or  not  a  feature  set  will  be  convex  and  separable. 
We  selected  500  msec  of  data  from  each  test  case  as  a  basis  for  statistical  analyses  of  separability. 
We  used  features  that  essentially  correspond  to  a  few  frequency  slices  through  the  CWT — energies 
in  particular  frequency  bands.  We  selected  the  set  of  bands  to  use  on  the  basis  of  overall  energy 
content — recall  that  robust  classification  is  possible  only  from  features  with  high  energy 
differences  between  cases.  In  the  cases  of  the  gearbox  and  fire  pump  data,  with  strong,  clear, 
narrow  fundamentals,  we  adapted  the  frequencies  to  the  center  of  that  line  (using  the  features 
themselves  instead  of  referring  to  external  synchronization  signals).  For  the  condensate  pump  data 
(which  lack  such  a  stable  reference  feature  and  whose  energies  are  more  dissipated  across 
frequency),  we  left  the  frequency  bands  constant. 

After  collecting  candidate  features,  but  before  training  an  ANN,  we  evaluated  feature 
cluster  separations.  Table  2-3  shows  the  maximum  separations  between  all  pairs  of  clusters  in 
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teims  of  Hsher  coefficients — essentially  distances  between  cluster  centers,  normalized  to  units  of 

standard  deviations.  This  table  was  obtained  by  computing  for  each  feature  vector  the  Fisher 

coefficients  between  all  fault  condition  pairs,  and  then  selecting  the  maximum  coefficient  across  all 

feature  vectors  for  every  fault  condition  pair.  Any  pair  of  clusters  more  than  three  or  so  units  apart 

should  be  readily  separable  by  an  ANN  classifier. 

TABLE  2-3.  PHASE  I CXUSTER  SEPARATIONS,  HELICOPTER  DATA 


W 

IR 

FE 

CR 

SP 

TC 

Normal 

0.00 

5.67 

10.79 

9.79 

11.14 

11.28 

Inner  Race 

5.67 

0.00 

4.13 

5.25 

13.31 

6.05 

Rolling  Element 

10.79 

4.13 

0.00 

2.07 

5.89 

3.45 

Outer  Race 

9.79 

5.25 

2.07 

0.00 

7.18 

3.29 

Gear  Spall 

11.14 

13.31 

5.89 

7.18 

0.00 

8.61 

Tooth  Cut 

11.28 

6.05 

3.45 

3.29 

8.61 

0.00 

We  found  that  the  CWT  features  provide  good  separation  between  cases.  Below  is  a 
summary  of  the  main  characteristics  of  these  features  for  each  of  the  test  systems. 

Helicopter  gearbox  feature  vectors  (channel  5)  contain  heights  of  narrowband  (1/35)  octave 


lines 


•  center  frequencies  adapt  to  changes  in  fundamental  mesh  frequency 

•  lines  were  selected  at  the  first  six  harmonics  of  mesh  frequency,  plus  two  other 
frequencies  suggested  by  the  CWT  (0.5525  *  f,  2.7  *  f ,  where  f  is  the  fundamental 
mesh  frequency) 

•  feature  clusters  are  nearly  ellipsoidal 

•  minimum  feature  cluster  separation  is  2.07  Fisher  units  (standard  deviations)  (outer 
race/roUing  element) 

Condensate  pump  feature  vectors  (axial  channel)  contain  energies  in  wider  regions  (1/6 


octave) 


•  center  frequencies  are  fixed  over  time  (and  cases) 


♦  bands  were  selected  at  octave  intervals  (32  -  1,024  Hz),  and  at  1/4  octave  intervals 
within  high  energy  octaves  (64  -  128  Hz,  512  -  1,024  Hz) 
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•  feature  clusters  are  nearly  ellipsoidal 

•  minimum  feature  cluster  separation  is  5.87  Fisher  units  (CP- 1 A/CP-4B) 

Fire  pump  feature  vectors  (.axial  channel)  also  contain  both  narrowband  and  broadband 
features 

•  center  ftequencies  adapt  to  changes  in  fundamental  shaft  frequency  (limited  to  within 
2%  nominal) 

•  narrow  bands  were  selected  at  first  8  shaft  harmonics,  and  at  octave  intervals  within 
high  broadband  energy  region  (512  -  2,048  Hz) 

•  some  feature  clusters  show  some  suspiciously  high  correlation  (possibly  due  to 
clipping?) 

•  minimum  feature  cluster  separation  is  2.23  Fisher  units  (Fault  3/Fault  6) 

The  following  three  subsections  graphically  illustrate  with  color  scattergrams  the  striking 
feature  separation  for  some  feature  pairs  and  all  fault  conditions  for  each  of  the  test  systems 
examined  in  this  work. 

2.4.1  Gearbox  Separation 

Plate  I  presents  the  feature  clusters  computal  from  3  seconds  of  gearbox  data  across  all  six 
cases.  Feature  vectors  can  be  obtained  every  10  msec,  so  there  are  about  300  sample  vectors  here. 
This  plate  shows  the  projection  of  the  feature  clusters  onto  a  two-dimensional  subspace  defined  by 
the  power  found  in  the  second  harmonic  of  the  mesh  frequency,  and  at  a  subharmonic  line  around 
1,050  Hz.  Note  that:  1)  all  of  the  feature  clusters  appear  convex,  2)  the  normal  case  is  well 
separated  from  the  fault  cases  by  these  two  features  alone,  and  3)  several  pairs  of  faults  can  be 
separated  as  well.  Other  pairs  of  features  provide  different  kinds  of  separation,  but  all  show 
convex  clusters. 

2.4.2  Condensate  Pump  Separation 

Plate  J  presents  the  feature  clusters  from  0.5  second  of  condensate  pump  data  across  all 
eight  cases.  There  are  about  30  feature  vectors  per  case.  It  shows  the  projection  of  the  feature 
clusters  onto  the  two-dimensional  subspace  defined  by  the  power  near  615  Hz,  and  near  1,024  Hz. 
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Again,  note  that:  1)  all  of  the  feature  clusters  appear  convex,  2)  the  normal  cases  are  well  separated 
from  the  fault  cases  by  these  two  features  alone,  despite  being  more  diffuse  due  to  variations 
among  units,  and  3)  type  1  and  type  2  faults  can  be  clearly  separated  as  well.  Other  pairs  of 
features  provide  different  kinds  of  separation,  but  all  show  convex  clusters. 

2.4.3  Fire  Pump  Separation 

Plate  K  presents  the  feature  clusters  computed  from  0.5  second  of  fire  pump  data  across  all 
16  cases.  There  are  about  30  feature  vectors  per  case.  It  shows  the  projection  of  the  feature 
clusters  onto  the  two-dimensional  subspace  defined  by  a  narrow  band  around  the  seventh  harmonic 
of  the  shaft  rate  and  a  wider  band  around  2,048  Hz.  Note  some  suspicious  characteristics  of  these 
clusters.  The  seventh  harmonic  of  the  normal  data  seems  to  be  limited  by  a  floor  at  42  dB  below 
the  shaft  fundamental,  making  detection  and  classification  of  fault  type  3A  (light  blue  squares) 
exceptionally  easy.  Also,  for  three  of  the  test  cases  (one  normal  and  two  fault),  the  values  of  these 
features  are  exactly  6  dB  apart,  a  relationship  that  is  highly  unlikely  in  truly  random  data.  It  is 
importaru  to  supplement  the  power  of  data-driven  approaches  with  some  understanding  of  the 
physics  of  the  system  under  study — why  classification  regions  are  the  way  they  are — in  order  to 
gain  corfidence  that  the  classification  logic  is  truly  robust  to  any  artifacts  that  may  be  in  the  training 
data. 

2.4.4  Artifacts  in  Training  Data 

Through  an  example,  this  subsection  illustrates  the  need  to  select  features  for  classification 
based  not  only  on  their  separation  but  also  on  the  physics  of  the  fault  mechanism.  The  risk  in  not 
doing  this  is  that  data  presented  to  the  ANN  classifier  may  contain  variations  upon  which 
classification  may  be  based,  but  which  bear  no  causal  relationship  to  fault  mechanisms.  For 
example,  we  selected  a  feature  at  the  sixth  harmonic  of  the  mesh  frequency  for  the  gearbox  data  to 
serve  as  a  disturbance  detector.  In  this  frequency  range  (12  kHz),  there  is  very  little  energy  in  any 
of  the  data  unless  a  disturbance  is  present.  Our  idea  was  that  if  a  feature  vector  with  relatively  high 
energy  (>  45  dB  below  the  power  in  the  mesh  fundamental)  were  presented  to  the  neural  net,  it 
would  result  in  an  ambiguous  classification  and  any  output  deferred  until  the  disturbance  subsides. 
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Quite  another  thing  happened.  Plate  L  shows  the  gearbox  feature  clusters  projected  onto 
the  subspace  defined  by  the  fifth  and  sixth  harmonic  power  levels.  Note  the  obvious  separation 
between  {Normal,  Inner  Race},  (Outer  Race,  Rolling  Element,  Tooth  Cut),  and  (Gear  Spall). 

The  fact  that  the  magnitudes  of  sixth  harmonic  power  levels  arc  so  small  suggests  that  this 
iSequency  is  near  a  zero  in  the  transfer  function  between  the  vibrating  mechanics  and  the  sensor. 
The  fact  that  their  variation  is  so  small  suggests  that  the  energy  in  this  band  is  largely  background 
energy,  or  conveyed  through  a  convoluted  transmission  path.  In  either  case,  the  variations  among 
cases  are  unlikely  to  be  caused  by  the  faults  themselves,  but  rather  by  the  process  of  inserting  and 
removing  faults.  An  adaptive  classifier  will  happily  use  the  power  level  at  the  sixth  harmonic  to 
separate  the  Normal  case  from,  say,  the  Gear  Spall.  Only  additional  insight  into  the  physics  of  the 
transmission  mechanism,  or  a  set  of  data  including  several  insertions  of  the  same  fault,  would 
prevent  field  deployment  of  a  classifier  that  treats  this  insertion  artifact  as  a  valid  source  of 
information. 

2.4.5  Guidelines  for  Finding  Robust  Feature  Sets 
'  Based  on  this  Phase  I  effort  we  have  developed  a  set  of  guidelines  for  finding  robust 

feature  sets  in  CWT  images.  These  guidelines  can  be  summarized  as  follows: 

•  Be  sure  that  features  are  robust  to  external  disturbances:  we  seek  high  energy  content 

I  features  to  be  derived  from  morphological  filtering  on  the  scale  axis  of  the  CWT,  with 

narrow  bandwidths  to  reduce  their  sensitivity  to  impulsive  disturbances.  Features  must 
'  also  be  redundant  to  exploit  the  correlation  among  features,  and  they  must  be  frequently 

*  computed  as  permitted  by  the  largest  time  constant  in  the  preprocessor. 

•  Be  sure  that  features  are  diverse:  it  is  desired  to  include  features  across  a  wide  range  of 
frequencies,  for  example,  the  first  six  hamionics  of  important  narrowband  vibrations 

I  (gearbox)  or  octave  samples  of  broadband  components  (pumps).  In  addition,  it  is 

desired  to  include  one  or  more  low-energy  features  to  support  disturbance  rejection. 

•  Be  sure  that  features  distinguish  normal  from  abnormal  conditions:  for  this  we  need  to 
compute  statistics  (mean,  standard  deviation)  on  each  CWT  bin  and  look  for  significant 

I  differences  that  will  lead  to  features  with  high  discriminating  power. 
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12.5  ARTIFICIAL  NEURAL  NETWORK  CLASSIFIER 

For  an  adaptive  classifier  we  used  a  feedforward  fully  interconnected  ANN  of  the  back- 
propagation  type  with  one  input  layer,  one  hidden  layer,  and  one  output  layer.  The  number  of 
processing  elements  (PEs)  in  the  input  layer  varied  with  the  vibrating  system  between  12  and  15. 
The  number  of  output  PEs  also  varied  with  the  vibrating  system,  according  to  the  number  of  fault 
conditions,  including  the  normal  cases.  Because  of  the  convexity  and  linear  separation  of  the 
feature  vector  clusters  for  all  the  systems  of  Phase  I,  the  number  of  hidden  layer  PEs  was  set  equal 
to  the  number  of  output  PEs.  Table  2-4  presents  the  number  of  PEs  per  layer  for  each  of  these 
systems. 


TABLE  2-4.  NUMBER  OF  PROCESSING  ELEMENTS  PER  ANN  LAYER 


LAYER 

HELICOPTER 

GEARBOX 

CONDENSATE 

PUMPS 

FIRE  PUMPS 

Input 

15 

12 

13 

Hidden 

6 

8 

16 

Output 

6 

8 

16 

Total  PEs 

27 

28 

45 

For  the  design,  training,  and  testing  of  the  ANNs  we  used  a  commercial  software  package, 
NeuralWorks  (NeuralWare,  1992),  running  on  a  Macintosh  platform.  For  each  of  the  three 
systems,  convergence  to  the  specified  RMS  error  of  the  difference  between  the  desired  and  the 
actual  outputs  occurred  relatively  fast — after  between  5,000  and  10,000  random  presentations  of 
the  feature  vectors  included  in  the  training  set.  Given  the  simplicity  of  the  ANNs  used  in  this 
work,  their  small  size,  and  the  excellent  feature  clusters  separation  made  possible  by  the  judicious 
utilization  of  the  CWT,  no  sophisticated  training  algorithms  were  required. 

From  the  test  set  results,  we  computed  the  following  measures  of  effectiveness  for  each 
vibrating  system;  probability  of  false  alarm,  probability  of  missed  detections,  probability  of 
misclassification,  and  probability  of  deferral.  Probability  of  false  alarm  is  the  probability  that  a 
fault  is  announced  when  there  is  no  fault  present.  Probability  of  a  missed  detection  is  the 


probability  that  no  fault  is  announced  when  there  is  a  fault  present.  Probability  of  misclassification 
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is  the  probability  that  a  fault  type  is  announced  when  a  different  fault  type  is  present.  Probability 
ofd^erral  is  the  probability  that  the  classifier  defers  a  decision  when  a  case  for  decision  (a  feature 
vector)  is  presented  to  it. 

For  the  purpose  of  this  work,  a  feature  vector  leads  to  an  ambiguous  situation  when  the 
absolute  difference  between  the  two  largest  competing  outputs  is  less  than  some  specified 
tolerance.  In  these  cases,  the  classifier  refuses  to  announce  a  decision  and  considers  the  next 
feature  vector.  The  consequence  of  this  defenal  is  to  decrease  the  probabilities  of  false  alarm  and 
missed  detections,  and  to  increase  the  time  delay  for  a  classifier  decision.  For  instance,  feature 
vectors  for  the  gearbox  system  are  computed  every  10  msec,  so  the  price  paid  in  time  delay  for 
each  deferral  (or  rejection)  is  10  msec,  delay  in  the  time  to  detect  a  fault. 

The  statistical  correlation  between  of  feature  vectors  depends  on  the  time  constants 
embedded  in  the  wavelet  preprocessor  designed  to  extract  the  selected  feature  set.  Extracting 
feature  vectors  at  a  period  exceeding  the  largest  of  these  time  constants  leads  to  (approximate) 
statistical  independence  between  successive  feature  samples.  Since  the  ANN  classification  process 
is  memoryless,  this  in  turn  assures  (approximate)  statistical  independence  between  successive 
classifications.  The  period  between  feature  reports  can  be  quite  short.  Table  2-5  presents  the 
maximum  time  constants  and  number  of  feature  vectors  per  second  allowed  by  such  time  constants 
for  the  three  vibrating  systems  of  Phase  1. 


TABLE  2-5.  PREPROCESSOR  TIME  CONSTANTS  AND  FEATURE  VECTOR  RATES 


UNIT 

MAX  TIME  CONSTANT 

FEATURE  VECTORS 
PER  SECOND 

Helicopter  gearbox 

10  msec 

100 

Condensate  pump 

25  msec 

40 

Fire  pump 

25  msec 

40 

We  maintain  statistical  independence  bewteen  successive  feature  vectors  (i.e.,  compute  then 
at  a  rate  bounded  below  by  the  longest  time  constant  in  the  preprocessor)  precisely  so  that  temporal 
fusion  of  classification  results  is  simple.  For  the  gearbox  data,  we  computed  feature  vectors  every 
10  msec,  and  classified  each  and  every  one.  To  robustify  the  classification  process  against 
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transient  disturbances,  we  can  simply  compare  output  classifications  over  a  window  of,  say,  100 
feature  samples  (one  second  of  data).  If,  say,  fewer  than  95  of  the  classifications  agree,  we 
assume  a  disturbance  (and  not  a  fault)  is  present.  This  dramatically  reduces  the  theoretical 
probability  of  false  alarm,  at  the  cost  of  an  additional  second  of  delay  in  producing  a  warning — a 
very  attractive  tradeoff  for  most  situations. 

2.6  FAULT  DETECTION  AND  IDENTIFICATION  RESULTS  FROM  PHASE  I 

Given  the  preceding  insight  into  the  derivation  of  high-energy  wavelet  features  and  the 
convex,  separable  clusters  they  form  in  feature  space,  it  should  be  no  surprise  that  good 
classification  results  are  possible.  Providing  a  feature  set  that  captures  the  important  discriminants 
between  normal  operation  and  faults  vastly  simplifies  the  problem  of  designing  an  adaptive 
classifier  that  achieves  good  performance. 

The  performance  results  for  each  of  the  test  systems  are  presented  in  Table  2-6.  The 
acceptance  threshold  is  the  ratio  between  the  maximum  output  value  and  the  next  larger  output 
value  for  a  given  feature  vector.  The  complexity  value  is  the  total  number  of  PEs  in  the 
corresponding  ANN.  For  the  gearbox  and  the  condensate  pumps  systems,  the  test  set  was 

independent  from  the  training  set;  for  the  fire  pumps  data  these  two  sets  were  the  same. 

TABLE  2-6.  PHASE  I  PERFORMANCE  RESULTS 


GEARBOX 

CONDENSATE 

PUMP 

FIRE  PUMP 

TRAINING  SET  SIZE 

1125 

240 

480 

TEST  SET  SIZE 

6750 

1400 

4800 

ACCEPTANCE  THRESHOLD 

1.4 

1.2 

2.0 

PROBABILITY  OF  FALSE  ALARM 

0.000 

0.000 

0.000 

PROBABILITY  OF  MISSED  DETECTION 

0.000 

0.000 

0.000 

PROBABILITY  OF  DEFERRAL 

0.035 

0.020 

0.020 

PROBABILITY  OF  MISCLASSIFICATION 

0.046 

0.000 

0.000 

COMPLEXITY 

27  PEs 

28  PEs 

45  PEs 

The  performance  results  in  Table  2-6  clearly  show  that  the  wavelet  feature  sets  selected 
above  permit  perfect  detection  performance  with  very  low  deferral  rates.  While  these  results  are 
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pleasing,  we  feel  that  an  even  more  important  principle  has  been  demonstrated.  We  used  exactly 
the  same  method  to  find  features  for  the  pumps  as  we  used  for  the  gearbox  data.  There  was  no 
trial  and  error  for  the  pump  classifiers — these  results  are  from  the  very  first  feature  sets  we  picked. 
This  offers  limited  but  important  evidence  that  our  results  are  not  accidental — that  we  have  a 
methodology  to  analyze  data  from  vibrating  systems  and  derive  small,  focused  feature  sets  that 
support  high-corfidence  fault  detection  and  classification. 

2.7  ANALYSIS  OF  FALSE  ALARM  AND  DEFERRAL  PROBABILITIES 

The  performance  results  in  the  previous  subsection  for  the  helicopter  gearbox  system  are 
based  on  a  test  set  containing  1,125  feature  vectors  for  each  of  the  fault  conditions.  To  get  greater 
insight  into  the  detector’s  performance  and  the  data  presently  available,  we  decided  to  compute 
performance  measures  for  the  normal  case  for  a  much  longer  test  set  (10  times  longer)  and  to 
examine  more  closely  the  time  histories  of  false  alarms  and  deferrals  and  the  tradeoff  between  these 
two  measures.  This  subsection  presents  the  results  of  this  analysis.  In  particular,  the  insight 
gained  through  this  analysis  suggests  additional  means  to  improve  the  detector’s  performance, 
means  which  will  be  especially  useful  when  dealing  with  helicopter  gearboxes  operating  under 
more-severe  environmental  conditions. 

2.7.1  Procedure 

The  longer  test  set  for  the  gearbox  normal  case  is  the  longest  that  can  be  extracted  from  the 
data  available  to  us  during  Phase  I.  This  data  set  contains  120  seconds  of  channel  5  vibrational 
readings  sampled  at  48  kHz.  Since,  as  before,  a  feature  vector  is  computed  for  every  512  samples, 
that  is,  every  10.67  msec,  this  data  set  provides  1 1,205  feature  vectors  for  the  normal  case,  after 
eliminating  a  small  number  of  edge-effect  contaminated  feature  vectors  computed  from  the 
beginning  of  the  data  set.  These  feature  vectors  are  then  passed  through  the  gearbox  ANN 
classifier  previously  trained  with  1,125  feature  vectors  from  all  fault  conditions,  including  the 
normal  case.  This  trained  ANN  is  the  same  ANN  used  to  obtain  the  performance  results  of  the 
previous  subsection. 
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2.7.2  Gearbox  False  Alarm  and  Deferral  Probabilities 

The  performance  results  for  the  longer  testing  set  described  above  appear  in  Table  2-7. 
These  results  include  only  the  false  alarm  and  deferral  probabilities  for  different  acceptance 
thresholds  for  the  helicopter  gearbox  system.  Since  this  testing  set  contains  only  normal  (or  no¬ 
defect)  case  feature  vectors,  probabilities  of  missed  detection  and  misclassification  are  not  included 
in  this  table. 


TABLE  2-7.  FALSE  ALARMS  AND  DEFERRALS  FOR  HELICOPTER  GEARBOX 


Acceptance  threshold 

0 

5 

6.7 

10 

Probability  of  false  alarm 

0.0159 

0.0118 

0.0116 

0.0114 

Probability  of  deferral 

0 

0.026 

0.037 

0.057 

Table  2-7  shows  that  when  no  deferrals  are  allowed  the  probability  of  false  alarm  is 
relatively  low  (1.59%),  and  for  a  deferral  probability  (or  deferral  rate)  as  low  as  4%  the  false 
alarm  probability  is  1,16%.  These  results  are  encouraging  given  that:  1)  the  testing  set  is  10  times 
longer  than  the  entire  training  set,  2)  the  testing  set  includes  “everything,”  that  is,  all  the  vibrational 
effects  that  were  recorded  during  the  entire  120  seconds  of  test  bench  operation,  possibly  including 
disturbances  whose  nature  and  duration  were  unknown  to  us,  and  3)  the  trained  ANN  did  not 
include  a  disturbance  (or  “strange  effects”)  detector.  (A  well-designed  and  trained  disuirbance 
detector  would  be  expected  to  drive  down  the  false  alarm  probability  by  eliminating  possible  false 
alarm  triggers.) 

2.7.3  False  Alarm  Time  History  and  Interarrival  Times 

Examination  of  the  false  alarm  time  history  for  a  given  deferral  rate  allows  one  to 
determine,  for  instance,  whether  false  alarms  occur  in  clusters  or  not,  whether  the  false-alarm 
temporal  distribution  displays  some  regular  pattern,  and  whether  clusters,  if  any,  are  indicative  of 
data  characteristics  unaccounted  for  by  the  training  set.  On  the  other  hand,  intercomparison  of 
false  alarm  time  histories  for  different  deferral  rates  allows  one  to  assess  the  tradeoff  between  false 
alarm  and  deferral  probabilities. 
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Figure  2-4  shows  the  false  alarm  time  history  for  0%  deferral  rate  over  the  1 1,205  epochs 
(approximately  120  seconds)  at  which  feature  vectors  from  the  gearbox  normal  case  were 
presented  to  the  ANN  classifier  for  a  decision.  The  top  part  of  the  figure  represents  approximately 
the  first  60  seconds,  and  the  bottom  part  represents  the  last  60.^  The  number  of  false  alarms  for 
0%  deferral  rate  is  178.  Figure  2-4  shows  that  when  deferrals  are  not  allowed,  false  alarms  occur 
in  two  clusters  on  the  left  of  the  figure  and  rather  randomly  everywhere  else.  A  discussion  of  this 
finding  is  presented  later  in  this  section. 
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Figure  2-4.  False  Alarm  Time  History  for  0%  Deferral  Rate 

Histograms  of  the  false  alarm  interarrival  times  appear  in  Fig.  2-5.  Selected  false-alarm 
interarrival-time  statistics  are  presented  in  Table  2-8.  Examination  of  the  interairival  times  shows 
that  most  of  them  are  equal  to  10.67  msec  and  come  from  the  two  clusters  mentioned  above. 


*  For  this  and  subsequent  time-history  figures,  note  the  following:  To  convert  feature  vector  number  to  time  in 
msec,  the  feature  vector  number  must  be  increased  by  20  (corresponding  to  the  first  20  feature  vectors  discarded 
because  of  edge  effects)  and  then  multiplied  by  10.67  msec,  which  is  the  time  interval  over  which  each  feature  vector 
is  computed. 
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Each  cell  is  20  msec  wide  (msec) 


Figure  2-5.  False-Alarm  Interarrival-Times  Histogram  for  0%  Deferral  Rate 

TABLE  2-8.  FALSE-ALARM  INTERARRIVAL-TIME  STATISTICS  FOR  SEVERAL 

DEFERRAL  RATES 


Deferral  Rate 

0% 

3.7% 

5.7% 

Minimum  (msec) 

10.67 

10.67 

10.67 

Maximum  (msec) 

23,573 

138.67 

85.33 

Mean  (msec) 

457.22 

12.90 

12.01 

Std.Deviation  (msec) 

2,625.7 

13.76 

9.16 

No.  of  false  alarms 

178 

125 

120 

The  false-alarm  time  history  for  a  3.7%  deferral  rate  is  presented  in  Fig.  2-6.  Note  that  by 
allowing  the  deferral  rate  to  increase  to  a  rather  low  value  of  3.7%,  the  false  alarms  beyond  the  fu'st 
cluster  in  Fig.  2-4  aie  eliminated;  the  number  of  false  alarms  for  3.7%  deferral  rate  decreases  to 
125  (from  178  for  a  0%  rate). 
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Figure  2-6.  False-Alarm  Time  History  for  3.7%  Deferral  Rate 

The  histogram  of  the  corresponding  false-alarm  interarrival  times  is  presented  in  Fig.  2-7. 
All  of  the  false  alarm  interarrival  times,  except  five,  are  equal  to  10.67  msec.  Selected  statistics  are 
presented  in  Table  2-8. 


°  Each  cell  is  20  msec  wide  (msec) 
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Figure  2-7.  False -Alarm  Interarrival-Time  Histogram  for  3.7%  Deferral  Rate 

The  false-alarm  time  histoiy  for  a  5.7%  deferral  rate  is  presented  in  Fig.  2-8.  Note  that 
even  after  allowing  the  deferral  rate  to  further  increase  to  a  value  of  5.7%,  most  of  the  false  alarms 
in  the  first  cluster  in  Fig.  2-6  still  remain;  in  fact,  the  number  of  false  alarms  for  a  5.7%  deferral 
rate  only  decreases  to  120  (from  125  for  a  3.7%  rate).  All  of  the  corresponding  false-alarm 
interarrival  times,  except  three,  are  equal  to  10.67  msec.  Selected  statistics  arc  presented  in  Table 
2-8. 
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Figure  2-8.  False-Alarm  Time  History  lor  5.7%  Deferral  Rate 


The  above  finding — the  persistence  of  the  false  alarm  cluster  in  Fig.  2-8  even  after 
allowing  the  deferral  rate  to  increase  to  5.7%  — suggests  that  some  unknown  disturbance  occurred 
at  that  time  interval,  approximately  between  the  2nd  and  the  3rd  second  of  the  data  set.  Therefore, 
we  examined  the  first  4  seconds  of  the  data  set  in  great  detail. 

Figure  2-9  displays  the  accelerometer  readings  for  the  first  4  seconds  of  channel  5  of  the 
helicopter  gearbox  normal  case.  Note  that  starting  at  about  2.0  sec  from  the  beginning  of  the 
record  a  disturbance  (unknown  to  us)  starts  to  develop  until  about  2.7  sec,  and  from  this  point  on 
until  3.6  sec  the  signal  drops  out  to  zero. 

In  Fig.  2-8,  comparing  the  time  interval  where  the  false  alarm  cluster  occurs  (between  1.9 
and  3.5  sec)  and  the  time  interval  where  the  unknown  disturbance  occurs  immediately  reveals  why 
the  false  alarm  cluster  does  not  go  away  even  when  the  deferral  rate  is  allowed  to  increase  from  0% 
to  5.7%.  More  importantly,  this  correspondence  between  a  disturbance  and  a  false  alarm  cluster 
clearly  suggests  that  if  we  had  included  a  well-trained  disturbance  detector  in  our  ANN  classifier, 
the  false  alarm  probabilities  would  have  been  much  more  lower! 
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Figure  2-9.  Accelerometer  Readings  for  First  4  Seconds  of  Channel  5 — Normal  Case 

The  above  finding  has  important  practical  implications  for  the  work  that  we  plan  to  conduct 
during  Phase  n  of  this  effort.  Since  helicopter  gearboxes  operating  under  more  severe 
environmental  conditions  are  subject  to  many  kinds  of  disturbances,  the  above  finding  suggests 
that  detection  performance  under  real-world  conditions  could  be  improved  significantly  if  a 
disturbance  detector  is  added  to  the  ANN  classifier.  Also,  adding  to  the  detector  the  capability  to 
monitor  the  false  alarm  time  history  for  the  presence  of  clusters  to  make  it  self-monitoring  will 
additionally  improve  its  performance. 

2.7.4  Deferral  Time  History  and  Interarrival  Times 

The  analysis  of  the  deferral  time  history  complements  the  corresponding  analysis  of  false 
alarms.  By  analyzing  the  deferral  time  history  one  can  determine,  for  instance,  whether  deferrals 
occur  in  clusters  or  more  sparsely  distributed,  and  whether  a  given  deferral  rate  causes  an 
unacceptable  delay  temporal  distribution  for  a  given  false  alarm  probability.  As  discussed 
previously,  the  price  paid  for  every  deferral  is  a  delay  equal  to  the  time  interval  for  which  a  feature 
vector  is  computed;  for  the  gearbox  system  every  deferral  introduces  a  delay  of  10.67  msec  in  the 
classifier’s  announcement  of  a  classification  decision. 

The  deferral  time  history  for  a  deferral  rate  of  3.7%  is  displayed  in  Fig.  2-10.  This  deferral 
rate  corresponds  to  a  false  alarm  probability  of  1.16%.  Note  the  presence  of  a  deferral  cluster  on 
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about  the  same  time  interval  where  a  disturbance-caused  false  alarm  is  present  in  Fig.  2-4.  This 
suggests  again  that  if  a  disturbance  detector  is  included  in  the  ANN  classifier,  both  the  number  of 
false  alarms  and  the  number  of  deferrals  should  decrease,  an  effect  which  obviously  contributes  to 
overall  performance  improvement  of  the  classifier. 


Figure  2-10.  Deferral  Time  History  for  3.7%  Deferral  Rate 

Figure  2-11  displays  the  deferral  time  history  for  a  5.7%  deferral  rate,  which  corresponds 
to  a  false  alarm  probability  of  1.14%.  Note  that  as  the  number  of  deferrals  increases,  as  it  should, 
the  number  of  deferral  clusters  does  not  increase  significantly,  which  indicates  that  the  increase 
from  3.7%  to  5.7%  in  the  deferral  rate  does  not  result  in  significant  additional  concentrated  time 
delays  in  the  sequence  of  classifier  decisions.  Selected  statistics  for  deferral  interarrival  times  for 
deferral  rates  of  3.7%  and  5.7%  are  included  in  Table  2-9. 
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Figure  2-11.  Deferrals  Time  History  for  5.7%  Deferral  Rate 
TABLE  2-9.  DEFERRAL  INTERARRIVAL-TIME  STATISTICS 


Deferral  Rate 

3.7% 

5.7% 

Minimum  (msec) 

10.67 

10.67 

Maximum  (msec) 

6,901 

4,992 

Mean  (msec) 

288.18 

185.6 

Std  Deviation  (msec) 

797.31 

467.32 

No.  of  deferrals 

410 

640 
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SECTION  3 

CONCLUSIONS  AND  RECOMMENDATIONS 

3.1  CONCLUSIONS  FROM  THE  PHASE  I  EFFORT 

Our  Phase  I  performance  results  speak  for  themselves.  Incipient  fault  detection  and 
classification  on  bench-test  and  mild-operation  data  is  quite  feasible,  even  without  exploiting  many 
additional  techniques  available.  The  CWT  provides  images  from  which  feature  selection  is  easy. 
These  features  are  simple  and  robust  (high  energy,  narrow  bandwidth). 

There  are  no  technological  impediments  to  practical  implementation.  The  selected  wavelet 
features  can  be  computed  using  off-the-shelf  digital  filtering  hardware.  ANNs  can  be  trained  and 
employed  using  off-the-shelf  techniques. 

The  helicopter  gearbox  data  used  was  bench  test  data.  Bench  test  data  cannot  suppon  a 
complete  characterization  of  normal  operating  regimes  and  cannot  include  a  complete  set  of 
disturbances  to  be  encountered  in  the  field.  Moreover,  seeded  fault  data  cannot  span  the  complete 
set  of  possible  failures,  and  may  contain  artifacts  that  assist  detection.  In  summary,  bench  tests  are 
not  reality.  Reality  is  much  less  controlled,  and  hence  much  less  predictable.  Although  the  pump 
data  were  obtained  under  actual  operating  conditions,  these  conditions  were  relatively  benign  and 
do  not  include  all  possible  operating  conditions.  Therefore,  the  principal  conclusion  we  draw  from 
Phase  I  is  to  greet  these  results  (or  any  Phase  I  results)  with  skepticism  concerning  their 
applicability  to  field  systems,  and  that  any  Phase  II  effort  must  focus  on  support  for  analysis  and 
design  with  real  data  under  widely  varied  operating  regimes. 

We  are  well  positioned  to  make  the  transition  to  real  data  in  Phase  n  because  of  the 
methodology  we  established  in  Phase  I.  We  have  a  specific  procedure  for  selecting  low¬ 
dimensional,  robust  feature  sets — and  have  demonstrated  its  efficacy  on  the  two  pump  data  sets. 
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We  deliberately  avoided  using  all  available  processing  techniques  to  solve  the  Phase  1 
problem — we  wanted  to  be  sure  that  a  simple  problem  could  be  solved  with  simple  techniques. 

We  arbitrarily  limited  ourselves  to  single-channel  processing,  with  instantaneous  classification. 

This  makes  available  substantial  additional  processing  power  to  meet  the  challenges  posed  by  using 
real  data,  and  we  recommendexploittingt  hem  to  the  fullest  in  Phase  II. 

3.2  PHASE  II  RECOMMENDATIONS 

Phase  I  demonstrated  the  feasibility  of  achieving  good  fault  detection  and  classification 
performance  based  on  a  single  channel  of  accelerometer  data  for  each  of  the  three  vibrating  systems 
studied  in  this  effort  Preliminary  analysis  of  the  helicopter  gearbox  data  from  a  different  channel 
(channel  6)  showed  that  there  is  indeed  additional  information  in  other  channels  that  could  be 
profitably  used  to  improve  the  ANN  classifier’s  performance.  Analysis  of  false  alarm  time  histories 
for  the  helicopter  gearbox  indicated  that  to  drive  down  the  false  alarm  probability  is  necessary  to 
characterize  as  much  as  possible  the  normal  behavior  of  the  vibrating  system,  including  all  possible 
kinds  of  disturbances.  This  characterization  will  require  the  analysis  of  vibrational  data  from 
systems  operating  under  the  broadest  possible  range  of  environmental  conditions. 

The  major  output  of  Phase  I  is  a  methodology  for  failure  detection  and  identification  from 
accelerometer  data  that  can  be  applied  to  any  vibrating  system  having  the  characteristics  of  the 
systems  included  in  this  effort  This  methodology  has  been  successfully  demonstrated  for  test 
bench  data  and  mild-operation  data.  To  convert  this  methodology  into  a  practical  and  useful  tool  for 
real  vibrating  systems  is  indispensable  to  test  it  with  real-world  data  from  systems  operating  under 
more  severe  environmental  conditions.  Therefore,  Phase  II  must  concentrate  on  this  task,  and  its 
successful  performance  will  depend  critically  on  the  availability  of  this  type  of  data. 

Information  obtained  during  the  course  of  this  Phase  I  effort  indicates  that  a  govennment 
agency  (NCCOSCTNRaD)  will  soon  collect  extensive  amounts  of  seeded  fault  data  and  unfailed 
flight  data  for  helicopter  gearboxes,  which  will  be  available  to  researchers  involved  in  failure 
detection  and  identification.  Extensive  helicopter  gearbox  data  might  be  obtained  in  principle  from 
helicopter  manufacturers  such  as  Boeing  or  Sikorsky,  with  whom  we  have  already  held 
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preliminary  technical  interchanges.  Given  this  projected  availability  of  helicopter  data,  the  need  of 
Phase  II  to  focus  on  real-world  data,  and  DARPA’s  continuing  interest  on  high-payoff 
applications,  we  will  base  the  recommendations  to  be  presented  next  on  the  assumption  that  the 
Phase  II  ^ort  will  concentrate  exclusively  on  incipient  detection  and  identification  of  helicopter 
gearbox  failures. 

The  major  output  of  Phase  El  must  be  a  preliminary  design  of  a  hardware/software  system 
for  real-time  incipient  detection  of  helicopter  gearbox  failures  under  operational  conditions.  This 
design  will  be  determined  by  timing,  sizing,  and  other  implementation  tradeoffs  determined  in  turn 
by  the  outcome  of  additional  research  and  development  to  be  conducted  during  Phase  H. 

The  main  issues  to  be  addressed  in  Phase  II  include:  1)  use  of  multichannel  data, 
particularly  for  separating  airframe  vibrations  from  internal  gearbox  vibrations,  2)  use  of  situational 
data  which  characterizes  the  different  environmental  conditions  under  which  helicopters  operate  to 
schedule  the  classification  weights  (and  perhaps  even  the  feature  set)  as  conditions  change,  3) 
clarification  of  the  maintenance  concept  for  helicopter  gearboxes  (e.g.,  when  does  ANN  training 
take  place?),  and  4)  selection  of  the  best  technology  for  Phase  ID  implementation.  We  discuss 
these  topics  in  more  detail  below. 

Use  of  multichannel  data:  One  is  not  limited  to  single-channel  accelerometer  data  in  many 
potential  applications.  Some  issues  to  be  addressed  include:  How  can  one  process  additional 
channels?  Is  there  advantage  to  using  a  vector  wavelet  transform  to  process  all  channels 
simultaneously?  What  can  wavelet  phase  information,  neglected  in  this  effort,  contribute  to 
separating  internal  and  external  sources  of  vibration?  Can  statistical  techniques  derived  for  use  in 
wavelet  transform  space  be  adapted  to  vector  transforms?  Is  there  a  need  for  different  techniques 
on  different  channels?  To  what  extent  can  accelerometers  mounted  on  the  supports  of  a  vibrating 
system,  rather  than  on  its  casing,  supply  information  about  environmental  vibrations  and 
disturbances?  How  should  this  information  influence  the  feature- selection  process? 

Use  of  situational  data:  We  believe  that  the  most  practical  approach  to  incipient  failure 
detection  is  to  characterize  as  much  as  possible  all  manifestations  of  gearbox  normal  behavior.  If 
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this  characterization  is  successful,  for  any  given  situation  the  classifier  will  be  able  to  detect  any 
failure  condition.  One  key  advantage  of  this  approach  is  that  in  principle  it  only  requires  unfailed 
data,  which  is  easier  and  safer  to  collect  than  seeded  or  unseeded  fault  data.  On  the  other  hand, 
this  approach  requires  the  collection  of  situational  data,  that  is,  data  representing  the  most  important 
environmental  conditions  under  which  the  helicopter  operates.  This  type  of  data  includes,  for 
instance,  altitude,  speed,  weather  data,  load,  maneuvers  being  conducted,  commanded  engine 
torque,  and  kinematic  acceleration.  Situational  data  could  be  used  as  parameters  to  characterize  the 
different  regions  of  normal  behavior.  Issues  to  be  addressed  include:  Which  group  of  situational 
variables  provide  a  satisfactory  delimitation  of  different  normal  operating  regimes?  Which 
elements  should  be  included  in  a  software  capability  to  allow  the  effective  use  of  situational  data  for 
detection  of  abnormal  conditions?  Is  it  better  to  use  situational  variables  to  parameterize  different 
classifiers  or  to  include  situational  variables  as  inputs  to  a  more  general  classifier? 

Clarification  of  the  maintenance  concept:  There  are  several  issues  related  to  incorporating 
incipient  failure  detectors  into  a  genuine  helicopter  gearbox  maintenance  concept.  Is  the  ANN 
trained  for  a  specific  platform,  or  for  an  entire  class?  Is  it  retrained  after  major  overhauls?  What 
are  the  relative  merits  of  detection  alone  vs.  detection  and  classification?  What  are  acceptable  false 
alarm  rates?  Are  there  other  meaningful  outputs  from  an  incipient  fault  detector  (e.g.,  rate  of 
development  of  an  anomaly)  that  can  be  useful  to  an  operator?  And,  of  course,  what  response  time 
is  required,  and  how  can  that  time  be  used  to  process  a  series  of  feature  vectors  in  order  to  reduce 
false  alarms  and  increase  detection  reliability? 

Selection  of  implementation  technology:  The  Phase  I  effort  yielded  a  preprocessor/ 
classifier  design  that,  while  evaluated  using  non-real-time  software  emulations,  can  be  readily 
implemented  in  hardware  for  real-time  operation.  We  anticipate  the  Phase  n  enhancements  will 
yield  equally  simple  real-time  computation  requirements.  To  preserve  the  possibility  of  immediate 
Phase  in  application.  Phase  II  should  resolve  the  basic  hardware  design  issues  for  the  real-time 
element.  What  is  the  overall  complexity  of  the  processing?  If  implemented  digitally,  what 
throughput  is  needed  (counted  as  fixed-point  multiplies  per  second),  and  what  opportunities  for 
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parallelism  or  pipelining  exist?  If  implemented  in  analog  circuitry,  what  component  tolerances  are 
necessary?  If  implemented  as  a  mix  of  digital  and  analog  elements,  what  is  the  best  role  for  each? 

There  are  some  other  technical  areas  that  would  benefit  fiom  additional  research  attention: 
Can  one  perform  the  statistical  evaluation  of  wavelet  features  directly  in  wavelet-transform  space 
(i.e.,  for  all  candidate  features)  rather  than  for  selected  features  in  a  post-transform  analysis?  Can 
one  apply  advanced  multiscale  estimation  techrtiques  to  extract  different  features  at  different  time 
scales?  How  effective  might  be  some  additional  wavelet  feature  types  (e.g.,  wavelet  packet 
coefficients  for  switching  systems,  or  multiscale  autoregressive  model  identification)?  Can  recent 
work  on  the  interpretation  of  ANN  outputs  as  likelihood  functions  be  directly  lirtked  to  the 
statistical  performance  analysis  currently  done  in  feature  space? 

In  summary,  we  recommend  that  the  central  objective  of  Phase  II  be  to  deliver  a 
preliminary  design  of  a  hardwarel software  system  for  real-time  detection  of  helicopter  gearbox 
failutes  under  real-world  operating  conditions.  This  design  will  include  the  capability  to  use 
multichannel  sensor  data  and  situational  data,  and  its  performance  will  be  measured  according  to 
the  requirements  of  a  clear  maintenance  concept  prepared  during  Phase  H.  Phase  II  also  should  be 
used  to  plan  for  and  gather  information  about  the  resources  required  for  a  successful 
implementation  in  Phase  HI  of  the  technology  developed  in  the  two  previous  phases. 
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APPENDIX  A 

THE  CONTINUOUS  WAVELET  TRANSFORM 

This  appendix  presents  the  basics  of  the  Continuous  Wavelet  Transform,  which  underlies 
our  methodology  for  selecting  features. 

A.l  WAVELET  BASES 

Wavelets  are  a  new  approach  to  an  old  problem — building  complicated  functions  out  of 
simple  elements.  Fourier  analysis  builds  complicated  functions  out  of  sine  and  cosine  functions. 
Wavelets  use  functions  with  limited  time  extent,  so  that  they  can  better  represent  time-localized 
aspects  of  a  function  than  can  a  sum  of  infinite-duration  sines  and  cosines. 

All  wavelet  analyses  are  based  on  dilations,  contractions,  and  translations  of  a  basic  mother 
wavelet.  If  the  dilations,  contractions,  and  translations  are  orthogonal  to  one  another,  then  the 
wavelets  form  an  onhonormal  basis  which  is  complete  in  many  cases.  One  unique  characteristic  of 
wavelet  analysis  is  that  there  are  an  infinite  number  of  choices  for  a  mother  wavelet,  allowing  one 
to  continuously  vary  the  tradeoff  between  time  and  frequency  localization.  Figure  A-1  shows  the 
Haar  wavelet,  on  the  left,  and  the  Kiang  wavelet,  on  the  right.  Figure  A-2  shows  dilated  and 
contracted  translates  of  the  Haar  wavelet. 


Figure  A-1.  The  Haar  and  Kiang  Wavelets 


In  its  most  general  form,  a  mother  wavelet  is  any  essentially  time-  and  band-limited 
function  of  t,  subject  to  the  uncertainty  principle  limitations.  (The  uncertainty  principle  states  that 
good  time  localization  is  obtained  at  the  expense  of  frequency  localization,  and  vice  versa.) 
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An  affine  wavelet  family  is  obtained  from  dilations  and  translations  of  the  mother  wavelet. 
Affine  wavelets  can  be  either  continuous  or  discrete  (usually  dyadic).  Dyadic  wavelet  families 
dilate  by  powers  of  two,  and  translate  by  integral  multiples  of  power  of  two.  Continuous  wavelet 
families  dilate  by  arbitrary  scale  factors,  and  translate  by  arbitrary  amounts. 


A. 2  WAVELETS  AND  SPECTRA 

Each  mother  wavelet  has  a  characteristic  footprint  in  time/frequency  space — a  region  where 
it  is  sensitive  to  energy.  Dilate  it  (or  contract  it)  by  a  factor  of  two,  and  this  region  expands  (or 
contracts)  by  a  factor  of  two  along  the  time  axis,  and  translates  by  an  octave  down  (or  up)  along 
the  frequency  axis.  Translate  it,  and  the  footprint  moves  an  equal  amount  along  the  time  axis. 
Take  a  set  of  dilations,  contractions,  and  translations  that  cover  all  of  time/frequency  space,  and 
you  have  a  complete  basis  set.  Figure  A-3  illustrates  a  typical  subdivision  by  wavelets  of  the  time- 
frequency  plane. 

Anas  covarad  by 
diffarant  wavalat 
basis  functions 

Tima  iocalizaiion 

Figure  A-3.  Typical  Subdivision  by  Wavelets  of  the  Time-Frequency  Plane 
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The  exciting  discoveries  that  kindled  the  recent  interest  in  wavelets  concern  the  existence  of 
such  complete  bases  that  are  also  orthonormal,  and  for  which  the  mother  wavelet  has  finite  support 
(is  non-zero  over  a  finite  length  of  time). 

A. 3  WAVELET  TRANSFORMS 

Because  there  are  an  infinite  number  of  mother  wavelets,  there  are  an  infinite  number  of 
wavelet  transforms.  By  varying  the  choice  of  the  mother  wavelet,  one  can  get  wavelet  transforms 
that  differ  in  both  time  and  frequency  localization. 

An  affine  wavelet  transform  is  the  set  of  coefficients  that  multiply  elements  of  a  wavelet 
family  in  order  to  represent  a  time  function.  Affine  wavelet  transforms  can  be  either  continuous  or 
discrete.  Continuous  wavelet  transforms  yield  a  complex-valued  function  of  time  and  space 
(frequency).  Discrete  (usually  dyadic)  wavelet  transforms  yield  a  finite  number  of  coefficients  for 
an  essentially  bounded  region  of  time  and  scale  (frequency).  All  wavelet  transforms  contain 
implicit  or  explicit  feature  detectors  (one  for  each  basis  function).  Figure  A-4  presents  a 
classification  of  affine  and  non-affine  wavelets  based  on  their  suppon  (columns)  and  extent  of  their 
filter  realizations  (rows). 
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Figure  A-4.  Classification  of  Wavelets 


Wavelet  selection  affects  preprocessor  design.  Efficient  real-time  feature  extraction 
imposes  constraints  on  possible  mother  wavelets.  We  require  causality  and  a  finite  dimensional 
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realization  of  the  wavelet  generator.  In  this  effort,  we  constrained  the  choice  of  mother  wavelet 
based  on  our  need  to  implement  a  feature  extractor  without  new  leaps  forward  in  electronics 
technology.  In  particular,  we  limited  our  choices  to  those  for  which  corresponding  filters  have  a 
finite  dimensional  realization.  All  of  the  wavelets  with  finite  support  possess  this  propeny,  but 
must  be  very  long  in  order  to  localize  the  narrowband  energy  so  obvious  in  the  gearbox  data. 
Therefore,  we  selected  a  wavelet  with  semi-infinite  support,  so  that  it  has  a  causal  realization,  and  a 
low-dimensional  implementation.  This  wavelet  was  derived  from  auditory  nerve  response  data 
collected  by  Kiang  in  the  1960s,  and  we  have  honored  him  by  appropriating  his  name  for  the 
wavelet. 

A. 4  SIMPLE  EXAMPLES  OF  CONTINUOUS  WAVELET  TRANSFORMS 

This  section  presents  color  images  of  Continuous  Wavelet  Transforms  (CWTs)  of  some 
basic  functions  to  give  the  reader  greater  insight  into  the  time-frequency  representation  of  signals 
provided  by  the  CWT  based  on  the  Kiang  wavelet  In  these  images,  the  frequency  scale  is 
logarithmic  and  runs  vertically,  increasing  to  the  top,  with  frequency  divisions  corresponding  to 
octaves  from  16  Hz  to  16  kHz.  The  time  scale  runs  horizontally,  increasing  to  the  right,  with  time 
divisions  62.5  msec  wide.  Hue  encodes  the  log  magnitude  of  the  CWT  (blue  =  low,  red  =  high). 
Phase  information  is  ignored. 

A. 4.1  Pulse  and  Sine  Wave 

The  left  image  in  Plate  M  is  a  wavelet  transform,  using  the  Kiang  wavelet,  of  a  1  msec 
pulse  sampled  at  48  kHz.  Note  the  nulls  at  multiples  of  1  kHz,  reflecting  the  fact  that  the  Kiang 
wavelet  is  highly  oscillatory  and  thus  almost  orthogonal  to  pulses  that  extend  over  an  integral 
number  of  complete  cycles.  Note  that  the  finer  scales  allow  very  precise  placement  of  the  time  that 
the  pulse  occurs,  and  the  absence  of  windowing  effects. 

The  right  image  is  a  wavelet  transform  of  a  stepped  sine  wave.  Note  that  initial  transients 
quickly  give  way  to  a  very  tight  localization  of  the  center  frequency  of  the  sine.  Note  also  that  the 
transients  decay  more  slowly  for  the  coarser  scales  at  the  right  of  the  figure. 
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A. 4. 2  Superposition  Examples 

The  left  image  in  Plate  N  is  a  wavelet  transform  of  the  sum  of  the  two  functions  in  Plate  M. 
It  provides  clear  evidence  that  a  single  transform  can  localize  transient  events  well  in  time,  while  at 
the  same  time  localizing  stationary  frequencies. 

The  right  image  is  a  wavelet  transform  of  two  stepped  sine  waves  of  identical  amplitude 
and  similar  frequency.  It  provides  a  dramatic  visualization  of  beat  effects,  as  the  pattern  of  the 
peak  is  a  periodic  variation  between  two  separate,  lower-energy  signals  and  a  single,  higher-energy 
signal.  As  the  difference  between  center  frequencies  increases,  proportionally  more  of  each  beat 
cycle  is  occupied  by  the  area  with  two  distinct  frequency  peaks.  Note  that  the  detail  of  the  beat 
structure  is  not  typically  present  in  sonograms,  lofargrams,  or  waterfall  displays — and  it  is  this 
'.'isibility  mto  time/frequency  Vdiidtions  of  signal  structure  that  is  the  advantage  of  the  CWT. 

A. 4. 3  Noise  Examples 

Tfre  left  image  in  Plate  O  is  a  wavelet  transform  of  a  white  Poisson  process  emitting  1-msec 
pulses,  with  a  mean  interarrival  time  of  10  msec.  Note  the  clear  separation  between  pulses 
apparent  at  the  finer  scales  at  the  left  of  the  image,  in  contrast  to  the  random  texture  at  lower  scales 
at  the  right.  There  is  clearly  no  structure  to  this  signal  that  is  localized  in  frequency. 

The  right  image  is  a  wavelet  transform  of  a  white  Gaussian  process.  The  apparent 
concentration  of  energy  at  the  finer  scales  (higher  frequencies)  is  due  to  the  fact  that  the  bandwidths 
of  wavelets  inevitably  increase  as  they  are  contracted.  Thus  a  process  with  equal  energy  at  each 
frequency  in  a  Fourier  transform  has  exponentially  increasing  power  as  frequency  increases  in  a 
wavelet  transform. 
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APPENDIX  B 

BASIC  MATHEMATICAL  ASPECTS  OF  THE  CWT 

This  appendix  presents  a  summary  of  some  basic  mathematical  aspects  of  the  Continuous 
Wavelet  Transform  (CWT).  For  a  more  detailed  treatment  the  reader  is  referred  to  Grossman  et  al. 
(1989),  Kronland-Martinet  et  al.  (1987). 

B .  1  ANALYZING  WAVELETS 

Suppose  that  the  function  g(t),  generally  complex -valued,  satisfies  the  following 
conditions: 

i)  g(t)  is  square-integrable,  that  is,  it  has  finite  energy 

j\g(t)fdt<oo 

ii)  |||((y)fd0D/|ty|<~ 

where  gico),  its  Fourier  transform,  is  defined  as 

Then  g(t)  is  called  an  analyzing  wavelet. 

In  practice,  however,  additional  conditions  are  imposed,  such  as: 

iii)  its  Fourier  transform  is  differentiable;  in  such  case,  condition  ii)  implies  that 
g(0)  =  0,  i.e.,  I g(t)dt  =  0,  and  thus  g(t)  oscillates  around  zero, 

iv)  g(t)  is  well  localized  in  time,  and  g(a?)  is  well  localized  in  frequency,  both  subject  to 
the  limitations  imposed  by  the  uncertainty  principle. 

The  family  of  continuous  wavelets  associated  with  the  analyzing  or  mother  wavelet  g(t)  is 
defined  by  its  dilates  and  translates,  that  is,  by  functions  of  the  form 
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where  b ,  the  Translation  parameter,  is  a  real  number,  and  a ,  the  scale  parameter,  is  a  positive  real 
number.  The  factor  l/Va  ensures  that  the  norms  of  g{t)  and  g'‘*‘''‘\t)  are  equal  (usually  equal  to 
unity).  The  norm  of  g{t)  is  defined  as 


B.2  THE  CONTINUOUS  WAVELET  TRANSFORM  OF  A  SIGNAL 


The  CWT  of  a  real-valued,  continuous,  deterministic  signal  5(r)  with  respect  to  the 
analyzing  wavelet  g{t)  is  the  function  S{b,a)  =  S^{b,d)  defined  on  the  open  half-plane  (6, a)  as 

Sib,a)  =  g^^—^^{t)dt  =  Vfl  J  g{aco)s{03)e‘^“‘dco 

where  g(t)is  the  complex  conjugate  of  g(t). 

Two  important  properties  of  the  correspondence  between  s(r)  and  Sib,a)  are: 

I )  Independence  of  choice  of  time  origin 

If  s{t)  is  shifted  in  time  by  tg,  (say,  s(t)  — >  s(t  —  tg)),  then  S{b,a)  is  transfomed  into 
S(b-tg,a). 


2)  Conservation  of  energy 

If  s{t)  is  a  signal  of  finite  energy,  then 


\s{tfdt  =  j\\\S{b,ai^~ 

where 

j—  0) 

Since  the  CWT  of  a  signal  is  generally  complex-valued,  it  can  be  expressed  as 
S(b,a)  =  \S(b,a)\e‘^^'’-'‘^ 

where  lS(Z?,a)|  is  the  modulus  and  ((>{b,a)  is  the  phase.  Both  provide  information  about  the  signal 
s{t).  For  instance,  abrupt  changes  in  the  signal  or  its  derivatives  can  under  suitable  circumstances 
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be  seen  on  its  transform.  A  discontinuity  appears  as  a  localized  increase  in  the  modulus  jS(6,a)|  for 
small  a,  around  a  point  since  a  discontinuity  contains  high  frequencies.  A  discontinuity  is  also 
indicated  by  the  convergence  of  lines  of  constant  phase  towards  a  point  at  the  edge  of  the  {b,a) 
half-plane. 

There  are  many  features  of  the  signal  that  can  be  seen  on  |5(Z?,a)l  and  which  are 
independent  of  the  choice  of  analyzing  wavelet  These  features  often  involve  the  phase  (p{b,a) 
(Grossman  et  al,  1989).  In  the  Phase  I  effort ,  information  only  from  the  modulus  was  used.  For 
Phase  n  we  plan  to  examine  phase  information  for  the  different  faults  to  see  whether  it  provides 
additional  significant  features  for  improving  the  classifier’s  performance. 

Since  the  CWT  is  essentially  a  convolution  between  the  signal  and  each  wavelet,  it  can  be 
computed  rather  straighforwardly  by  filtering  the  signal  data  samples  with  an  appropriate  digital 
filter.  The  digital  filter  corresponding  to  the  Kiang  wavelet  used  in  the  Phase  I  effort  has  a  causal, 
finite  dimensional  realization. 

A  more  intuitive  treatment  of  the  CWT  and  examples  of  the  CWT  of  simple  signals  are 
presented  in  Appendix  A. 


51 


TR-567 


/ 


